INDEX
Explanations
words related to countries or territories
mentions of the term "men" in various contexts
New Auto-Interp
Negative Logits
VICE
-0.82
RAY
-0.74
Pwr
-0.71
Dog
-0.70
ENC
-0.70
BILL
-0.69
NEY
-0.67
GRE
-0.65
TOP
-0.63
Berry
-0.63
POSITIVE LOGITS
opausal
1.21
uscript
1.06
endez
0.98
gling
0.97
stru
0.95
thren
0.94
士
0.93
volent
0.91
istan
0.90
emen
0.84
Activations Density 0.019%