INDEX
Explanations
phrases related to schemes and efforts to influence or disrupt events or individuals
phrases indicating attempts or efforts related to influence or control
New Auto-Interp
Negative Logits
checked
-0.70
avia
-0.68
Collider
-0.66
Steps
-0.66
Typ
-0.65
etus
-0.63
Printed
-0.63
Chili
-0.62
Needs
-0.61
Choice
-0.61
POSITIVE LOGITS
promote
1.32
deceive
1.29
discredit
1.28
maximize
1.27
intimidate
1.27
divert
1.26
capitalize
1.26
deprive
1.26
minimize
1.25
undermine
1.23
Activations Density 0.272%