INDEX
Explanations
references to female individuals
references to the word "girl."
New Auto-Interp
Negative Logits
aeda
-0.86
CAST
-0.80
rehend
-0.72
henko
-0.70
destro
-0.69
PDATE
-0.69
aution
-0.68
aukee
-0.68
odcast
-0.68
insula
-0.68
POSITIVE LOGITS
Scouts
1.04
girls
0.87
hood
0.86
ish
0.85
girl
0.82
ishly
0.82
girl
0.81
folk
0.80
panties
0.79
girls
0.79
Activations Density 0.020%