INDEX
Explanations
mentions of the English language
mentions of the English language
New Auto-Interp
Negative Logits
enges
-0.74
psy
-0.74
ĸļ
-0.70
eus
-0.70
onies
-0.69
usky
-0.68
achine
-0.68
vati
-0.68
Downloadha
-0.67
pty
-0.67
POSITIVE LOGITS
translation
1.03
language
0.97
speaking
0.96
shire
0.96
muff
0.94
translations
0.91
subtitles
0.89
man
0.88
men
0.86
spe
0.82
Activations Density 0.031%