INDEX
Explanations
instances of specific terms and expressions related to societal issues and actions
New Auto-Interp
Head Attr Weights
0:0.07
1:0.03
2:0.04
3:0.13
4:0.07
5:0.06
6:0.11
7:0.08
8:0.05
9:0.10
10:0.14
11:0.06
Negative Logits
Sullivan
-3.11
Juice
-2.73
osite
-2.60
Robinson
-2.57
Sullivan
-2.57
Brighton
-2.56
ovy
-2.55
Raymond
-2.55
uo
-2.50
Mole
-2.49
POSITIVE LOGITS
�
5.41
�
5.25
4.02
†
3.33
«
3.14
�
3.13
vernment
2.88
pand
2.87
■
2.78
�
2.77
Activations Density 0.175%