INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    many
    0.60
    Many
    0.58
    both
    0.57
    simply
    0.57
     ಕೂಡ
    0.57
     многим
    0.57
    stability
    0.55
    𝔀
    0.55
    cómo
    0.55
    picker
    0.55
    POSITIVE LOGITS
    しか
    0.93
     एकमात्र
    0.89
     einzige
    0.82
    のみ
    0.79
     Only
    0.79
     excepting
    0.78
     einzigen
    0.74
    만이
    0.74
     лишь
    0.73
     Hanya
    0.73
    Act Density 0.017%

    No Known Activations