INDEX
    Explanations

    phrases that express positivity, particularly those containing the word "good."

    New Auto-Interp
    Negative Logits
    ament
    -0.15
    ount
    -0.15
    бол
    -0.15
    upert
    -0.15
     butt
    -0.14
    agt
    -0.14
    isLoggedIn
    -0.14
    åł
    -0.14
     Educ
    -0.14
    ÙģÙĪ
    -0.14
    POSITIVE LOGITS
    yms
    0.17
    HRESULT
    0.16
    698
    0.16
    _Reference
    0.15
    ymi
    0.15
     Sor
    0.15
    lander
    0.15
    ẽ
    0.14
     Fé
    0.14
    _wo
    0.14
    Act Density 0.045%

    No Known Activations