INDEX
Explanations
references to girls and women in various contexts
New Auto-Interp
Negative Logits
elay
-0.17
vig
-0.16
esus
-0.16
ibold
-0.16
byss
-0.16
forth
-0.16
abis
-0.15
amsung
-0.15
kdir
-0.15
bsolute
-0.15
POSITIVE LOGITS
hood
0.24
-boy
0.20
boys
0.17
/y
0.16
/man
0.16
phép
0.16
åĢij
0.15
friends
0.15
friend
0.15
اÙĨÙĩ
0.15
Activations Density 0.063%