INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     allegiance
    -0.07
     noticed
    -0.06
    Pawn
    -0.06
    جل
    -0.06
    calls
    -0.06
     *=
    -0.06
     silly
    -0.06
    -Russian
    -0.06
     كتب
    -0.06
     più
    -0.06
    POSITIVE LOGITS
    Sou
    0.07
    Bro
    0.07
    áct
    0.07
    exao
    0.06
    Nobody
    0.06
    _TCP
    0.06
    roupe
    0.06
    .black
    0.06
     "":↵
    0.06
     hero
    0.06
    Act Density 0.032%

    No Known Activations