INDEX
Explanations
instances of opinion pieces or op-eds
references to op-eds, political figures, and political issues
New Auto-Interp
Head Attr Weights
0:0.06
1:0.02
2:0.24
3:0.05
4:0.24
5:0.05
6:0.03
7:0.02
8:0.06
9:0.10
10:0.05
11:0.02
Negative Logits
inki
-1.57
igans
-1.51
fra
-1.38
Dane
-1.19
Tsu
-1.18
Wolves
-1.16
Kag
-1.15
wig
-1.15
Guarant
-1.15
Jae
-1.13
POSITIVE LOGITS
lisher
1.70
ciation
1.44
ocument
1.42
lished
1.41
monary
1.40
BLIC
1.39
ש
1.36
CLS
1.35
estate
1.33
CTR
1.33
Activations Density 0.007%