INDEX
Explanations
references to women, femininity, and gender-related topics
New Auto-Interp
Negative Logits
REDACTED
-0.86
UFF
-0.78
RAY
-0.75
-+-+
-0.71
REC
-0.70
Flavoring
-0.70
raltar
-0.70
æĸ¹
-0.69
ypes
-0.69
rip
-0.69
POSITIVE LOGITS
folk
1.25
empowerment
1.04
opausal
0.94
genital
0.94
menstru
0.93
hood
0.93
breasts
0.92
menstrual
0.85
contraceptive
0.84
reproductive
0.84
Activations Density 0.347%