INDEX
Explanations
terms related to gender identity and gender norms
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.07
3:0.21
4:0.03
5:0.02
6:0.17
7:0.19
8:0.04
9:0.04
10:0.07
11:0.07
Negative Logits
Pradesh
-1.34
DERR
-1.21
ו
-1.17
ventory
-1.10
Airways
-1.10
Lord
-1.10
Priv
-1.09
Wil
-1.09
schild
-1.09
ドラゴン
-1.08
POSITIVE LOGITS
shaming
1.19
deport
1.15
seniors
1.11
inki
1.05
Dems
1.05
remix
1.04
ornament
1.03
dystop
1.02
alogy
1.02
activism
1.01
Activations Density 0.012%