INDEX
Explanations
references to men and male-related terms
New Auto-Interp
Negative Logits
FactoryBot
-0.17
Sad
-0.17
ernote
-0.16
imler
-0.15
atik
-0.15
piring
-0.15
AML
-0.14
lassen
-0.14
Ley
-0.14
rát
-0.14
POSITIVE LOGITS
opause
0.32
endez
0.26
orca
0.25
ager
0.23
ubar
0.23
cken
0.23
udo
0.23
elik
0.22
ninger
0.22
acing
0.22
Activations Density 0.014%