INDEX
Explanations
mentions of young females, particularly in the context of relationships or societal roles
repeated mentions of the term "girl."
New Auto-Interp
Negative Logits
rehend
-0.81
psey
-0.76
raltar
-0.74
PDATE
-0.73
HAEL
-0.72
aeda
-0.68
odcast
-0.68
utherford
-0.68
arily
-0.68
fusc
-0.68
POSITIVE LOGITS
Scouts
0.99
girl
0.89
girls
0.88
girls
0.86
panties
0.86
hood
0.83
bang
0.82
dolls
0.80
folk
0.80
herself
0.78
Activations Density 0.037%