INDEX
Explanations
mentions of different languages
mentions of languages
New Auto-Interp
Negative Logits
urion
-0.86
ecake
-0.85
roxy
-0.82
kus
-0.82
rodu
-0.81
arranted
-0.80
ilts
-0.80
apego
-0.80
olls
-0.80
romeda
-0.79
POSITIVE LOGITS
spoken
1.08
learners
1.07
language
1.00
language
0.94
interpreter
0.90
ĨĴ
0.90
proficiency
0.89
lear
0.89
anguage
0.87
Languages
0.86
Activations Density 0.022%