INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     spoiled
    -0.08
     اللو
    -0.08
    առ
    -0.08
    леді
    -0.07
    -0.07
    ాచ్
    -0.07
     emblem
    -0.07
    াতা
    -0.07
     coveted
    -0.07
     crowned
    -0.07
    POSITIVE LOGITS
     तनाव
    0.08
     bursts
    0.08
    _gain
    0.08
     kişi
    0.07
     Werte
    0.07
    ergic
    0.07
    Bet
    0.07
     nass
    0.07
     Barbar
    0.07
     widers
    0.07
    Act Density 0.000%

    No Known Activations