INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     punishments
    -0.07
    Connect
    -0.07
     Remote
    -0.06
     Kick
    -0.06
     COMP
    -0.06
     SendMessage
    -0.06
     CET
    -0.06
    Give
    -0.06
     yap
    -0.06
     fandom
    -0.06
    POSITIVE LOGITS
    0.07
    vably
    0.07
     вещ
    0.07
     الو
    0.07
     utrecht
    0.07
     eerie
    0.06
     shine
    0.06
    .This
    0.06
    ---</
    0.06
     fools
    0.06
    Act Density 0.028%

    No Known Activations