INDEX
Explanations
references to familial relationships and gender roles
New Auto-Interp
Head Attr Weights
0:0.11
1:0.09
2:0.05
3:0.08
4:0.04
5:0.09
6:0.05
7:0.02
8:0.08
9:0.14
10:0.07
11:0.11
Negative Logits
remlin
-1.62
imaru
-1.60
76561
-1.57
aughed
-1.56
ajor
-1.55
claimed
-1.54
edIn
-1.53
ento
-1.52
answered
-1.51
ailable
-1.51
POSITIVE LOGITS
segregated
1.48
mascul
1.46
multicultural
1.42
Enlightenment
1.27
Esk
1.26
Cycling
1.24
Muslim
1.22
Emmanuel
1.22
mobilize
1.22
diversity
1.21
Activations Density 0.044%