INDEX
Explanations
phrases containing the word 'men'
repeated mentions of the word "men."
New Auto-Interp
Negative Logits
IVERS
-0.84
REDACTED
-0.76
Deal
-0.75
Accessory
-0.72
REC
-0.71
VICE
-0.70
EV
-0.69
KEN
-0.69
Main
-0.68
Democratic
-0.67
POSITIVE LOGITS
volent
1.30
opausal
1.26
endez
1.20
ager
1.13
uscript
1.05
aced
1.01
orah
1.00
aces
0.93
agers
0.91
folk
0.91
Activations Density 0.051%