INDEX
Explanations
instances of inaccuracies or falsehoods in a text
terms related to deceptive or incorrect information and its consequences
New Auto-Interp
Negative Logits
negie
-0.77
oubted
-0.74
GOODMAN
-0.71
ICA
-0.71
rolet
-0.70
verning
-0.70
ampions
-0.68
ificantly
-0.68
atorium
-0.67
interrupted
-0.66
POSITIVE LOGITS
syndrome
1.01
glers
0.92
Syndrome
0.85
perpetrated
0.80
manship
0.73
imaginable
0.69
Exception
0.68
ulence
0.67
mas
0.67
practices
0.66
Activations Density 0.369%