INDEX
Explanations
specific references to individuals or groups
references to the word "who."
New Auto-Interp
Negative Logits
MER
-0.74
BACK
-0.70
Glob
-0.64
emin
-0.63
urb
-0.62
alam
-0.62
rinse
-0.62
eme
-0.62
OOL
-0.61
namese
-0.61
POSITIVE LOGITS
soever
1.16
else
1.05
abouts
0.87
exactly
0.84
afort
0.83
owns
0.83
cares
0.82
cared
0.81
redes
0.77
owes
0.75
Activations Density 0.041%