INDEX
Explanations
references to different languages
New Auto-Interp
Negative Logits
<bos>
-0.54
#
-0.40
indest
-0.39
ąp
-0.37
terci
-0.35
do
-0.35
chuckles
-0.34
return
-0.34
Ell
-0.34
]')
-0.33
POSITIVE LOGITS
Languages
1.78
languages
1.69
Languages
1.55
languages
1.50
Sprachen
1.19
anguages
1.03
lingue
1.02
langues
0.92
idiomas
0.92
lenguas
0.89
Activations Density 0.005%