INDEX
Explanations
names of news outlets in the text
news outlet names
New Auto-Interp
Negative Logits
figure
-0.74
}}}
-0.70
obser
-0.65
Interstitial
-0.63
gradient
-0.62
emort
-0.61
taboola
-0.60
vulner
-0.60
cffffcc
-0.59
fig
-0.57
POSITIVE LOGITS
that
1.08
that
0.92
he
0.84
they
0.84
there
0.76
she
0.75
è¦ļéĨĴ
0.73
it
0.69
THAT
0.67
yesterday
0.65
Activations Density 0.106%