INDEX
Explanations
references to various media outlets and their political implications
New Auto-Interp
Head Attr Weights
0:0.02
1:0.03
2:0.05
3:0.03
4:0.04
5:0.05
6:0.16
7:0.32
8:0.04
9:0.02
10:0.08
11:0.11
Negative Logits
breeze
-1.18
ivably
-1.17
pez
-1.16
ucket
-1.16
Collection
-1.09
Generator
-1.08
wastewater
-1.08
croft
-1.08
ecause
-1.08
basin
-1.07
POSITIVE LOGITS
politics
1.25
nl
1.20
ervative
1.17
mun
1.15
roots
1.13
Christian
1.10
uther
1.08
hawks
1.08
think
1.08
alike
1.07
Activations Density 0.028%