INDEX
Explanations
words related to lies and deception
references to deception or false information, particularly in relation to medical, environmental, and social issues
New Auto-Interp
Negative Logits
adelphia
-0.92
Mobility
-0.90
ktop
-0.83
Schwar
-0.82
Ports
-0.82
allery
-0.77
Rail
-0.74
ahime
-0.74
corridor
-0.74
esa
-0.74
POSITIVE LOGITS
falsehood
1.76
debunk
1.63
hoax
1.58
misinformation
1.53
factual
1.52
false
1.51
disinformation
1.51
truth
1.50
truthful
1.49
False
1.49
Activations Density 0.899%