INDEX
Explanations
recognizable, iconic things
New Auto-Interp
Negative Logits
আন
0.41
अनु
0.40
Rationale
0.40
acao
0.40
sched
0.40
afforded
0.39
designee
0.39
裨
0.39
advant
0.38
Луч
0.38
POSITIVE LOGITS
famously
0.87
berühm
0.85
familiar
0.84
famoso
0.84
famous
0.83
знамени
0.80
مشہور
0.79
iconic
0.78
famosa
0.78
유명
0.77
Activations Density 0.476%