INDEX
Explanations
words related to a person named "He"
references to the pronoun 'He' and possibly other male identifiers
New Auto-Interp
Negative Logits
etheless
-0.81
quo
-0.63
lining
-0.62
bombard
-0.61
wars
-0.60
toe
-0.59
WAYS
-0.59
litter
-0.58
injection
-0.58
intrusion
-0.57
POSITIVE LOGITS
atson
0.94
lder
0.92
cht
0.90
isman
0.89
isen
0.87
ppard
0.86
arer
0.85
eger
0.85
imer
0.85
ALTH
0.83
Activations Density 0.084%