INDEX
Explanations
mentions of "fake news" and related deceptive or misleading information
New Auto-Interp
Negative Logits
tamment
-0.76
debout
-0.74
primaire
-0.74
Portail
-0.72
NamedQueries
-0.71
তথ্যসূত্র
-0.71
volando
-0.70
berdayakan
-0.70
participaron
-0.69
canzoni
-0.69
POSITIVE LOGITS
fake
1.67
Fake
1.52
Fake
1.45
fake
1.40
faux
1.29
false
1.27
phony
1.18
Faux
1.18
fakes
1.17
pretend
1.15
Activations Density 0.227%