INDEX
Explanations
references to protests and social movements
New Auto-Interp
Negative Logits
ģ
-0.16
403
-0.16
th
-0.15
oj
-0.15
ienia
-0.15
gy
-0.15
136
-0.15
commission
-0.14
deter
-0.14
en
-0.14
POSITIVE LOGITS
sko
0.24
олоÑĪ
0.23
isti
0.21
iÄįka
0.21
SKI
0.21
iÄį
0.20
arna
0.20
иÑĩ
0.19
porno
0.19
Ñģки
0.18
Activations Density 0.010%