INDEX
Explanations
conspiracies or plots involving collusion, rigging, or conspiracy theories
references to conspiracy and collusion
New Auto-Interp
Negative Logits
ains
-0.73
esa
-0.70
anse
-0.68
area
-0.68
Flo
-0.68
asta
-0.67
gain
-0.67
alg
-0.67
lication
-0.66
inguished
-0.66
POSITIVE LOGITS
conspiring
1.02
collusion
0.99
perpetrated
0.95
concoct
0.94
blackmail
0.93
orchestrated
0.93
sabot
0.92
eering
0.92
rigged
0.89
sabotage
0.88
Activations Density 0.118%