INDEX
Explanations
patterns related to conspiracy theories and malicious attempts
phrases related to conspiracy and plots
New Auto-Interp
Negative Logits
standing
-0.78
answered
-0.76
aired
-0.73
felt
-0.71
amo
-0.70
Emerging
-0.68
checked
-0.68
enough
-0.67
vo
-0.67
apache
-0.67
POSITIVE LOGITS
deceive
1.60
intimidate
1.47
undermine
1.46
mislead
1.43
assassinate
1.42
deprive
1.40
discredit
1.39
manipulate
1.39
confuse
1.34
punish
1.30
Activations Density 0.286%