INDEX
Explanations
references to notes and related citations
New Auto-Interp
Negative Logits
anger
-0.15
Ñĸк
-0.14
Hercules
-0.14
ôt
-0.14
cox
-0.13
ointed
-0.13
ÌĢ
-0.13
ampton
-0.13
kke
-0.13
occer
-0.13
POSITIVE LOGITS
ares
0.16
inst
0.15
departure
0.15
HomeAsUp
0.15
ueur
0.14
overst
0.14
iect
0.14
dữ
0.14
Virgin
0.14
aeda
0.14
Activations Density 0.007%