INDEX
Explanations
terms and themes related to femininity and gender roles
New Auto-Interp
Negative Logits
ution
-0.18
orda
-0.14
ainers
-0.14
ivar
-0.14
icators
-0.14
urement
-0.14
hesion
-0.14
arton
-0.14
eração
-0.14
etu
-0.13
POSITIVE LOGITS
gy
0.35
ky
0.33
upy
0.31
chy
0.31
ppy
0.31
py
0.30
ipy
0.30
isty
0.30
ty
0.30
zy
0.29
Activations Density 0.111%