INDEX
Explanations
references to women and their roles or experiences in various contexts
New Auto-Interp
Negative Logits
/she
-0.20
females
-0.18
妻
-0.17
ython
-0.17
Female
-0.17
female
-0.16
жен
-0.16
meisje
-0.16
himself
-0.15
guys
-0.15
POSITIVE LOGITS
hood
0.28
folk
0.24
/man
0.22
ized
0.22
izing
0.22
izers
0.22
izer
0.21
zimmer
0.21
empowerment
0.19
/m
0.17
Activations Density 0.057%