INDEX
Explanations
terms related to political extremism and racial tension
New Auto-Interp
Negative Logits
neau
-0.82
IOR
-0.78
payer
-0.72
hower
-0.70
Veter
-0.67
cit
-0.67
ysis
-0.67
fixed
-0.66
mination
-0.65
Canal
-0.64
POSITIVE LOGITS
extremists
0.93
rally
0.93
extremist
0.88
extremism
0.88
supremacists
0.86
hate
0.86
ideology
0.86
ervative
0.86
ideologies
0.85
thugs
0.84
Activations Density 0.085%