INDEX
Explanations
words related to languages or linguistic activities
repeated references to "language."
New Auto-Interp
Negative Logits
destroy
-0.67
ãģ®éŃĶ
-0.67
iatus
-0.66
bryce
-0.66
Presents
-0.66
aunt
-0.65
invest
-0.64
Ul
-0.61
pless
-0.61
Triumph
-0.61
POSITIVE LOGITS
language
3.84
Language
3.02
language
2.57
languages
2.54
Language
2.51
Languages
2.19
anguage
2.04
linguistic
1.84
anguages
1.82
vocabulary
1.77
Activations Density 0.021%