INDEX
Explanations
references to male characters or individuals
New Auto-Interp
Negative Logits
isser
-0.16
ernes
-0.15
lemn
-0.15
ynos
-0.14
боÑĤ
-0.14
orno
-0.14
emean
-0.14
permalink
-0.14
intptr
-0.14
dana
-0.14
POSITIVE LOGITS
cke
0.17
inel
0.17
ifa
0.16
Welch
0.15
eka
0.15
zan
0.15
utz
0.15
Sheridan
0.15
ny
0.14
Clifford
0.14
Activations Density 0.054%