INDEX
Explanations
references to a specific 'man' or male character
New Auto-Interp
Negative Logits
']")
-0.96
BibitemShut
-0.96
^(@)
-0.96
piac
-0.87
"])
-0.85
']").
-0.82
))}
-0.82
}}]{-0.82
"]}
-0.82
Theſe
-0.81
POSITIVE LOGITS
man
1.98
Man
1.93
Man
1.83
MAN
1.73
man
1.72
MAN
1.50
mans
1.32
woman
1.29
oman
1.16
homem
1.15
Activations Density 0.080%