INDEX
Explanations
terms and phrases related to fake news and deception
New Auto-Interp
Negative Logits
tarifa
-0.69
حوالہ
-0.69
اریخ
-0.68
volando
-0.68
chacun
-0.65
parlant
-0.65
timbangkan
-0.64
pribadi
-0.64
DriverManager
-0.64
debout
-0.64
POSITIVE LOGITS
fake
1.39
Fake
1.28
false
1.23
fake
1.22
Fake
1.18
False
1.15
fakes
1.13
False
1.13
false
1.10
hoax
1.06
Activations Density 0.280%