INDEX
    Explanations

    mentions of personal preferences or favorites

    expressions of personal preferences or favorites

    New Auto-Interp
    Negative Logits
    heed
    -0.90
    aping
    -0.84
    thur
    -0.79
    ijk
    -0.79
    asse
    -0.78
    ural
    -0.78
    aton
    -0.77
    yrinth
    -0.76
    amping
    -0.76
     enthusi
    -0.76
    POSITIVE LOGITS
     favorite
    1.12
    Favorite
    1.01
     Favorite
    0.96
     favorites
    0.92
     favourite
    0.90
    é¾įå¥ij士
    0.86
     darling
    0.85
    favorite
    0.85
    ="#
    0.76
     watering
    0.75
    Act Density 0.013%

    No Known Activations