INDEX
Explanations
phrases mentioning the English language
references to the English language
New Auto-Interp
Negative Logits
onies
-0.94
xtap
-0.87
apego
-0.87
aunder
-0.86
prus
-0.85
pty
-0.84
igslist
-0.81
psy
-0.81
ramid
-0.78
aundering
-0.74
POSITIVE LOGITS
translation
0.91
muff
0.85
translations
0.83
English
0.82
spe
0.76
translator
0.74
Literature
0.73
language
0.72
shire
0.72
English
0.72
Activations Density 0.016%