INDEX
Explanations
names of politicians and political parties
mentions of prominent political figures and their affiliations
New Auto-Interp
Negative Logits
withd
-0.68
newcom
-0.61
SAR
-0.60
conclud
-0.60
CentOS
-0.58
Agric
-0.58
scrut
-0.58
Oper
-0.58
auxiliary
-0.56
redes
-0.56
POSITIVE LOGITS
"
0.94
disgrace
0.93
"'
0.90
unfairly
0.88
impe
0.87
rigged
0.85
"#
0.85
hypocritical
0.85
"â̦
0.84
deserved
0.83
Activations Density 0.830%