INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     питання
    -0.07
     autonomy
    -0.07
    edu
    -0.07
    					  
    -0.06
     flagged
    -0.06
    expected
    -0.06
     brutal
    -0.06
    英語
    -0.06
     decided
    -0.06
     Bret
    -0.06
    POSITIVE LOGITS
     Shoes
    0.07
     GCBO
    0.06
     strife
    0.06
    (lock
    0.06
     şar
    0.06
    ằm
    0.06
     scissors
    0.06
     Hairst
    0.06
     їй
    0.06
    lacağ
    0.06
    Act Density 0.001%

    No Known Activations