INDEX
Explanations
references to political affiliations and lobbying
New Auto-Interp
Negative Logits
statt
-0.16
abile
-0.15
haciendo
-0.14
ãģ¨ãģĦ
-0.14
arada
-0.14
este
-0.14
ederek
-0.14
INTERRUPTION
-0.13
LError
-0.13
ÑĢаÑĩ
-0.13
POSITIVE LOGITS
who
0.57
who
0.46
qui
0.33
Who
0.32
Who
0.30
whom
0.29
quien
0.29
whose
0.29
quienes
0.28
kteÅĻÃŃ
0.23
Activations Density 0.154%