INDEX
Explanations
references to women
mentions of "woman" in various contexts
New Auto-Interp
Negative Logits
ypes
-0.99
raltar
-0.82
kefeller
-0.82
Flavoring
-0.77
vernment
-0.76
ernels
-0.76
ype
-0.76
UFF
-0.75
aucuses
-0.74
emetery
-0.72
POSITIVE LOGITS
izer
1.22
hood
1.09
herself
1.09
pher
1.00
folk
0.95
vagina
0.93
cule
0.92
izers
0.90
menstru
0.89
Louise
0.89
Activations Density 0.059%