INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    getReference
    -0.08
    zure
    -0.06
     Block
    -0.06
    _st
    -0.06
     acted
    -0.06
     на
    -0.06
    -0.06
    лоп
    -0.06
     admittedly
    -0.06
     Над
    -0.06
    POSITIVE LOGITS
     twist
    0.09
     twists
    0.07
     Kushner
    0.07
     Restrictions
    0.07
    aturday
    0.07
     двор
    0.06
    laması
    0.06
     Twist
    0.06
    Insp
    0.06
     передбач
    0.06
    Act Density 0.007%

    No Known Activations