INDEX
Explanations
names associated with prominent female figures
New Auto-Interp
Negative Logits
thic
-0.19
edik
-0.16
ascus
-0.15
ivre
-0.15
_usec
-0.15
orks
-0.15
ouser
-0.15
ernaut
-0.14
Means
-0.14
yle
-0.14
POSITIVE LOGITS
lio
0.17
enberg
0.16
tings
0.16
dol
0.15
ots
0.15
rib
0.15
gan
0.14
oration
0.14
tor
0.14
sick
0.14
Activations Density 0.055%