INDEX
Explanations
references to individuals matching a specific gender and hints at potential scenarios involving them
occurrences of the word "woman" in various contexts
New Auto-Interp
Negative Logits
ypes
-0.90
ernels
-0.79
rament
-0.79
quickShipAvailable
-0.78
inctions
-0.75
raltar
-0.74
agascar
-0.74
kefeller
-0.73
Flavoring
-0.73
aucuses
-0.73
POSITIVE LOGITS
izer
1.27
hood
1.14
herself
1.01
pher
0.96
folk
0.96
izers
0.94
cule
0.89
breastfeeding
0.88
vagina
0.88
menstru
0.88
Activations Density 0.061%