INDEX
Explanations
negative words associated with deception and misinformation
references to misinformation and its effects
New Auto-Interp
Negative Logits
GOODMAN
-0.86
ufact
-0.79
natureconservancy
-0.78
hesion
-0.74
Temperature
-0.72
ederation
-0.72
entary
-0.71
bridge
-0.70
gio
-0.69
pection
-0.68
POSITIVE LOGITS
perpetrated
1.03
slander
0.96
insin
0.92
baseless
0.91
disinformation
0.91
accusations
0.91
misinformation
0.90
falsehood
0.88
bigotry
0.88
bigot
0.88
Activations Density 0.778%