INDEX
Explanations
instances of political dishonesty or rhetoric
New Auto-Interp
Negative Logits
robbers
-0.56
raped
-0.54
Clik
-0.52
mú
-0.50
RotationOrder
-0.49
kirj
-0.48
robbing
-0.48
robbery
-0.48
robberies
-0.47
شاف
-0.47
POSITIVE LOGITS
misinformation
1.05
disinformation
0.90
unfounded
0.89
propaganda
0.88
sensational
0.86
unsub
0.82
false
0.81
hysteria
0.80
demag
0.78
rhetoric
0.78
Activations Density 0.634%