INDEX
    Explanations

    phrases related to misunderstandings or disagreements in social interactions

    expressions of confusion or frustration about a situation

    New Auto-Interp
    Negative Logits
     ÂŃ
    -1.11
    âĢij
    -1.06
    -0.98
    Footnote
    -0.87
    ®,
    -0.79
    Thirty
    -0.76
    Enlarge
    -0.76
    -0.75
    "—
    -0.73
    )—
    -0.72
    POSITIVE LOGITS
     didnt
    1.71
     doesnt
    1.69
     dont
    1.68
     alot
    1.47
     lol
    1.35
     english
    1.35
     tho
    1.34
    nt
    1.29
     dmg
    1.17
     wont
    1.17
    Act Density 1.078%

    No Known Activations