INDEX
Explanations
expressions related to political governance and standards
New Auto-Interp
Negative Logits
demokrat
-0.17
democr
-0.17
avad
-0.16
érc
-0.16
Democratic
-0.15
Democr
-0.15
AllWindows
-0.15
obic
-0.15
Bias
-0.15
Democrat
-0.15
POSITIVE LOGITS
leadership
0.19
vá»įng
0.17
party
0.16
wing
0.16
Leadership
0.16
desert
0.16
Kemp
0.15
306
0.15
moderation
0.15
deform
0.14
Activations Density 0.095%