INDEX
Explanations
mentions of political matters or terms
references to political issues or contexts
New Auto-Interp
Negative Logits
actory
-0.84
imus
-0.82
imates
-0.82
olen
-0.82
ibles
-0.81
tered
-0.81
upon
-0.77
Customer
-0.74
eret
-0.74
ergic
-0.74
POSITIVE LOGITS
correctness
1.33
upheaval
0.98
unrest
0.92
affiliation
0.91
persuasion
0.91
affili
0.91
subdivision
0.89
considerations
0.88
leaders
0.87
activism
0.87
Activations Density 0.036%