INDEX
Explanations
terms related to political matters or situations
New Auto-Interp
Negative Logits
tered
-0.84
ibles
-0.80
olen
-0.78
wered
-0.75
lished
-0.75
imates
-0.74
imus
-0.72
upon
-0.71
Colt
-0.70
plin
-0.69
POSITIVE LOGITS
correctness
1.41
upheaval
1.07
instability
0.99
turmoil
0.98
uphe
0.94
leaders
0.92
clout
0.91
stability
0.89
prisoners
0.89
parties
0.89
Activations Density 0.036%