INDEX
Explanations
mentions of Russian interference
references to Russian interference in various contexts
New Auto-Interp
Negative Logits
etically
-0.84
sal
-0.81
ille
-0.76
than
-0.76
imb
-0.76
char
-0.75
chat
-0.73
neys
-0.73
xious
-0.73
\\\\\\\\
-0.72
POSITIVE LOGITS
interference
1.33
meddling
1.05
interfere
0.96
interfered
0.95
destro
0.94
tampering
0.93
medd
0.88
interfering
0.87
undermin
0.79
intrusion
0.78
Activations Density 0.011%