INDEX
Explanations
languages and language-related terms
references to languages and multilingual topics
New Auto-Interp
Negative Logits
kefeller
-0.78
apor
-0.76
romeda
-0.75
ramid
-0.75
arranted
-0.74
Reward
-0.73
apego
-0.73
horm
-0.72
oppable
-0.71
rolet
-0.71
POSITIVE LOGITS
languages
1.60
language
1.53
language
1.49
diction
1.49
english
1.41
Arabic
1.39
English
1.37
english
1.34
English
1.34
Hindi
1.33
Activations Density 0.665%