INDEX
Explanations
pronouns referring to males, particularly starting with the word "he"
references to a male individual
New Auto-Interp
Negative Logits
Seym
-0.72
emale
-0.70
etheless
-0.66
raints
-0.57
igslist
-0.57
flix
-0.56
constit
-0.55
terms
-0.54
deen
-0.52
execute
-0.52
POSITIVE LOGITS
zbollah
0.91
mos
0.77
Majesty
0.75
eded
0.73
sych
0.67
dor
0.66
brew
0.65
sing
0.65
'd
0.64
ALTH
0.64
Activations Density 0.252%