INDEX
    Explanations

    phrases expressing emotional experiences and reflections

    New Auto-Interp
    Negative Logits
    NB
    -0.14
    oso
    -0.14
    amak
    -0.14
    wechat
    -0.14
     ë¶Ģ
    -0.13
    hey
    -0.13
    oho
    -0.13
    lis
    -0.13
    oust
    -0.13
    elay
    -0.13
    POSITIVE LOGITS
    aira
    0.17
     Tru
    0.15
    atur
    0.14
    atar
    0.14
     bet
    0.14
    Ñħод
    0.14
     sew
    0.14
     neon
    0.14
    abet
    0.13
    ilet
    0.13
    Act Density 0.340%

    No Known Activations