INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    he
    0.77
     he
    0.70
    \[
    0.69
     Immediately
    0.68
     visual
    0.65
    Inclusive
    0.63
    \]
    0.63
     прида
    0.62
    immediately
    0.62
     enthusi
    0.62
    POSITIVE LOGITS
     Даже
    0.93
    notlocked
    0.86
     کبھی
    0.85
     навіть
    0.84
     možda
    0.84
     REP
    0.80
    𝐓
    0.80
    LVector
    0.80
     바꾸
    0.80
     difícil
    0.79
    Act Density 0.178%

    No Known Activations