INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     entertained
    -0.08
     Mang
    -0.08
     Tribut
    -0.08
     Wider
    -0.08
    Prov
    -0.08
     Corte
    -0.07
    -0.07
     Dib
    -0.07
    ządz
    -0.07
     dih
    -0.07
    POSITIVE LOGITS
    946
    0.08
    112
    0.08
     careg
    0.07
    ase
    0.07
     общ
    0.07
    ="">↵
    0.07
    111
    0.07
    सँग
    0.07
    {{--
    0.07
    ={`
    0.07
    Act Density 0.001%

    No Known Activations