INDEX
    Explanations

    mentions of personal preferences or favorite things

    mentions of personal favorites

    New Auto-Interp
    Negative Logits
    aton
    -0.77
    raz
    -0.76
    asse
    -0.76
    aping
    -0.74
    rene
    -0.74
    acial
    -0.73
    thur
    -0.73
    apers
    -0.72
    ural
    -0.72
    rain
    -0.70
    POSITIVE LOGITS
     favorites
    1.48
     favourites
    1.32
     favorite
    1.09
    é¾įå¥ij士
    0.97
    favorite
    0.96
     Favorite
    0.90
     favourite
    0.90
    itism
    0.87
     fav
    0.86
     Favor
    0.84
    Act Density 0.006%

    No Known Activations