INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     IR
    -0.07
    Expression
    -0.07
     insane
    -0.06
     Anal
    -0.06
    hooks
    -0.06
    tour
    -0.06
     url
    -0.06
    Variables
    -0.06
     butt
    -0.06
     kp
    -0.06
    POSITIVE LOGITS
     경우
    0.07
     sometime
    0.07
     Gallup
    0.07
     आख
    0.06
     juga
    0.06
     prezident
    0.06
     pouco
    0.06
     แบบ
    0.06
    ]:=
    0.06
    0.06
    Act Density 0.058%

    No Known Activations