INDEX
    Explanations

    instances of emotional reactions and social observations

    New Auto-Interp
    Negative Logits
    its
    -0.20
    here
    -0.16
    nt
    -0.15
    ade
    -0.15
    ....
    -0.15
    ve
    -0.14
     bazen
    -0.14
    ....↵
    -0.14
    2
    -0.14
    ge
    -0.14
    POSITIVE LOGITS
     period
    0.18
     haha
    0.17
     PLUS
    0.17
    sans
    0.17
     Ãł
    0.17
     er
    0.17
     lol
    0.16
     LOL
    0.16
     ha
    0.16
     eh
    0.16
    Act Density 0.224%

    No Known Activations