INDEX
Explanations
references to societal issues and the implications of social structures
New Auto-Interp
Head Attr Weights
0:0.06
1:0.05
2:0.03
3:0.03
4:0.03
5:0.47
6:0.02
7:0.02
8:0.05
9:0.08
10:0.08
11:0.04
Negative Logits
wagen
-2.13
qt
-2.12
selage
-2.09
wo
-2.07
eca
-2.00
osa
-1.94
orah
-1.94
xi
-1.93
ela
-1.90
thens
-1.90
POSITIVE LOGITS
respons
2.37
notions
2.21
considerations
2.16
bias
2.09
affinity
2.07
supportive
2.06
interests
2.04
expectations
2.00
patronage
1.99
oppos
1.99
Activations Density 0.237%