INDEX
Explanations
statements related to verification or checking of information or facts
mentions of factual claims or assertions
New Auto-Interp
Negative Logits
offence
-0.69
creen
-0.66
clerk
-0.65
haw
-0.64
taboola
-0.64
nm
-0.63
deed
-0.62
ransom
-0.61
lihood
-0.61
DER
-0.58
POSITIVE LOGITS
eworks
0.83
ilitation
0.74
tis
0.74
enza
0.72
ptive
0.72
Myth
0.71
eless
0.71
DonaldTrump
0.71
Tire
0.70
iled
0.70
Activations Density 0.029%