INDEX
Explanations
references to political leadership and elections
New Auto-Interp
Negative Logits
ãĥ³ãĥij
-0.16
gz
-0.16
weg
-0.15
BORDER
-0.15
apon
-0.15
Chair
-0.14
vatel
-0.14
ä¸Ī
-0.14
-layout
-0.14
locals
-0.14
POSITIVE LOGITS
Workers
0.28
left
0.27
Dil
0.25
Worker
0.24
PT
0.24
Kir
0.24
left
0.23
Rousse
0.23
-left
0.22
Workers
0.20
Activations Density 0.025%