INDEX
Explanations
references to political parties and ideologies, especially those associated with extremism or historical significance
New Auto-Interp
Negative Logits
eskort
-0.16
olley
-0.15
kå
-0.15
Bias
-0.14
etal
-0.14
CHA
-0.14
rotch
-0.14
celik
-0.14
assis
-0.14
ester
-0.14
POSITIVE LOGITS
party
0.26
split
0.24
splits
0.23
spl
0.23
formations
0.22
Party
0.20
splitter
0.20
ide
0.20
contest
0.20
membership
0.20
Activations Density 0.074%