INDEX
    Explanations

    phrases expressing preferences for specific things or entities

    mentions of "favorite" things or preferences

    New Auto-Interp
    Negative Logits
    heed
    -0.83
    aping
    -0.82
    hani
    -0.78
    acial
    -0.78
    urches
    -0.76
    asse
    -0.73
    ural
    -0.73
    thur
    -0.73
    aton
    -0.73
    pex
    -0.72
    POSITIVE LOGITS
     favorite
    1.25
     favorites
    1.03
    favorite
    0.99
    Favorite
    0.97
     Favorite
    0.97
     favourite
    0.96
    é¾įå¥ij士
    0.87
     darling
    0.84
    ="#
    0.82
    ļéĨĴ
    0.82
    Act Density 0.011%

    No Known Activations