INDEX
Explanations
news headlines or article sections with sensational or alarming content
instances of the word "READ" or related calls to action
New Auto-Interp
Negative Logits
angers
-0.70
perties
-0.69
aea
-0.65
asketball
-0.64
phies
-0.63
reconnect
-0.61
phia
-0.61
wine
-0.60
Johns
-0.60
paid
-0.59
POSITIVE LOGITS
ING
1.27
INGS
1.22
ALSO
1.15
TY
1.14
MORE
1.14
BOOK
1.09
TING
1.09
NESS
1.07
ME
1.06
WOR
1.06
Activations Density 0.023%