INDEX
Explanations
the word "men."
repeated references to "men" throughout the text
New Auto-Interp
Negative Logits
Deal
-0.76
REDACTED
-0.75
IVERS
-0.74
EV
-0.73
Pwr
-0.71
Accessory
-0.69
ITED
-0.68
Allows
-0.68
REC
-0.67
Berry
-0.67
POSITIVE LOGITS
opausal
1.21
endez
1.19
volent
1.19
uscript
1.11
ager
1.07
orah
1.02
folk
0.99
aced
0.99
aces
0.97
gling
0.90
Activations Density 0.048%