INDEX
    Explanations

    expressions of personal favorites or preferences

    New Auto-Interp
    Negative Logits
    er
    -0.61
    ers
    -0.60
    ou
    -0.58
     de
    -0.54
     I
    -0.54
    Thanos
    -0.54
     De
    -0.53
    ER
    -0.52
     and
    -0.50
     (
    -0.50
    POSITIVE LOGITS
     favorites
    1.15
     BrowserModule
    1.13
     favorite
    1.12
     Favorite
    1.07
     Favorites
    1.06
    favorite
    1.05
     FAVORITE
    1.05
     Theſe
    1.05
     favourites
    1.03
     favourite
    1.02
    Act Density 0.006%

    No Known Activations