INDEX
Explanations
references to male and female characters in various contexts
New Auto-Interp
Negative Logits
plode
-0.16
_persona
-0.15
اÙħبر
-0.15
adelphia
-0.15
ocard
-0.14
themselves
-0.14
leine
-0.14
หม
-0.14
RuntimeObject
-0.14
zcze
-0.14
POSITIVE LOGITS
named
0.50
named
0.36
whose
0.29
Named
0.29
whom
0.29
name
0.28
who
0.28
called
0.27
names
0.27
Named
0.27
Activations Density 0.165%