INDEX
Explanations
references to political parties, specifically the Communist Party
New Auto-Interp
Negative Logits
èĪĪ
-0.15
inant
-0.15
説
-0.14
imbus
-0.14
"group
-0.13
à¹Ĭà¸ģ
-0.13
جÙħÙĪØ¹
-0.13
ded
-0.13
zdy
-0.13
جÙĩ
-0.13
POSITIVE LOGITS
rio
0.20
yle
0.17
опол
0.16
ero
0.15
tsx
0.15
ibo
0.14
teri
0.14
igr
0.14
itive
0.14
orial
0.14
Activations Density 0.010%