INDEX
Explanations
references to or mentions of men and related terms
references to men and their presence in various contexts
New Auto-Interp
Negative Logits
IVERS
-0.84
VICE
-0.73
REDACTED
-0.69
Closure
-0.65
Main
-0.64
Berry
-0.63
worthiness
-0.63
âĺħâĺħ
-0.61
KEN
-0.60
DATA
-0.59
POSITIVE LOGITS
opausal
1.45
endez
1.29
volent
1.27
aced
1.19
orah
1.19
ager
1.19
aces
1.17
stru
1.04
folk
1.00
uscript
0.99
Activations Density 0.080%