INDEX
Explanations
terms related to politics
repeated mentions of the term "political" and related concepts
New Auto-Interp
Negative Logits
actory
-0.89
olen
-0.88
wered
-0.87
Cancel
-0.84
ighed
-0.76
IER
-0.76
tered
-0.75
imates
-0.75
uses
-0.75
imus
-0.74
POSITIVE LOGITS
correctness
1.28
affiliation
0.98
affili
0.97
pund
0.94
clout
0.94
campaigns
0.91
activism
0.87
rhetoric
0.87
persuasion
0.86
leaders
0.85
Activations Density 0.033%