INDEX
Explanations
words related to political figures, organizations, and controversies
references to conspiratorial and far-right political groups and figures
New Auto-Interp
Negative Logits
concess
-0.82
Takeru
-0.72
eatures
-0.71
externalActionCode
-0.70
united
-0.70
iour
-0.69
Ago
-0.68
repay
-0.67
gap
-0.67
Ng
-0.66
POSITIVE LOGITS
ervative
1.12
ervatives
1.05
adherents
1.00
memes
0.87
theories
0.86
ideology
0.86
tracts
0.84
heresy
0.83
propaganda
0.82
extremist
0.82
Activations Density 0.206%