INDEX
Explanations
phrases indicating support or favor, particularly with a strong emphasis on the word "of" and numerical modifiers
phrases related to political debates and contentious topics
New Auto-Interp
Head Attr Weights
0:0.03
1:0.02
2:0.10
3:0.06
4:0.34
5:0.09
6:0.02
7:0.02
8:0.09
9:0.14
10:0.03
11:0.02
Negative Logits
pload
-1.37
satisf
-1.29
otide
-1.23
omach
-1.20
��
-1.20
atoon
-1.19
revel
-1.19
�
-1.18
Definition
-1.17
chore
-1.17
POSITIVE LOGITS
Associates
1.37
Brett
1.32
extremes
1.28
luence
1.22
Majority
1.16
animous
1.16
kB
1.15
aline
1.14
groups
1.13
berus
1.12
Activations Density 0.007%