INDEX
Explanations
phrases related to politics
references to political concepts and terminology
New Auto-Interp
Negative Logits
actory
-0.91
olen
-0.86
imates
-0.80
tered
-0.80
wered
-0.79
IER
-0.78
Cancel
-0.77
Customer
-0.75
xt
-0.75
cellent
-0.75
POSITIVE LOGITS
correctness
1.34
clout
0.95
upheaval
0.93
affiliation
0.89
appoint
0.87
turmoil
0.86
affili
0.86
subdivision
0.85
affairs
0.84
pund
0.84
Activations Density 0.041%