INDEX
Explanations
references to fake news and concepts related to truth and falsehood
falsehoods and misrepresentation
New Auto-Interp
Negative Logits
PreferredItem
-0.53
famí
-0.45
Carriera
-0.45
Beats
-0.45
الإنجليزية
-0.44
concur
-0.43
commend
-0.41
verlang
-0.40
brio
-0.40
Continental
-0.40
POSITIVE LOGITS
tromper
0.57
contentLoaded
0.56
تضيفلها
0.55
errores
0.52
Wahrheit
0.52
disinformation
0.51
falsche
0.50
falsehood
0.50
fraudulent
0.50
errors
0.50
Activations Density 0.101%