INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     confront
    -0.08
     позволя
    -0.07
    allowed
    -0.07
     clever
    -0.07
     tam
    -0.07
     permet
    -0.07
     strategies
    -0.07
    ifest
    -0.07
     ingen
    -0.07
    _algo
    -0.07
    POSITIVE LOGITS
     ș
    0.08
    ulose
    0.08
     شر
    0.08
    -packed
    0.08
     Jesu
    0.08
     Ordin
    0.08
     tekk
    0.08
    atae
    0.07
    0.07
     الشر
    0.07
    Act Density 0.001%

    No Known Activations