INDEX
Explanations
references to political corruption and power dynamics within the political system
New Auto-Interp
Negative Logits
okus
-0.18
uti
-0.14
endors
-0.14
ius
-0.14
ues
-0.13
eries
-0.13
(pub
-0.13
COVID
-0.13
.books
-0.13
ä½ı
-0.13
POSITIVE LOGITS
Wall
0.35
New
0.29
Associated
0.27
Times
0.26
article
0.26
Guardian
0.26
Washington
0.26
_New
0.25
Wall
0.25
LA
0.25
Activations Density 0.283%