INDEX
Explanations
people's names or terms related to politics and news channels, especially CNN and Fox News
New Auto-Interp
Negative Logits
ovych
-0.70
tip
-0.64
schild
-0.63
kil
-0.61
impulse
-0.61
¯
-0.59
WARD
-0.58
gat
-0.58
idays
-0.57
yip
-0.57
POSITIVE LOGITS
ulhu
1.16
estial
0.88
berus
0.84
ãĤ¼ãĤ¦ãĤ¹
0.83
ioxide
0.82
igham
0.80
ornia
0.78
emporary
0.77
estine
0.74
pillar
0.72
Activations Density 1.323%