INDEX
Explanations
mentions of 'fake news'
references to "fake news."
New Auto-Interp
Negative Logits
ayne
-0.70
Scher
-0.70
vasive
-0.70
asse
-0.68
inence
-0.66
arious
-0.64
inished
-0.64
xus
-0.63
urdue
-0.63
Swe
-0.62
POSITIVE LOGITS
worthy
1.05
rooms
1.01
feed
0.96
room
0.95
worthiness
0.92
headlines
0.87
groups
0.82
peak
0.82
coverage
0.81
reader
0.80
Activations Density 0.040%