INDEX
Explanations
references to independence movements and self-determination
New Auto-Interp
Negative Logits
olik
-0.15
pll
-0.15
ington
-0.15
obox
-0.14
воÑĢ
-0.14
aight
-0.14
ODY
-0.14
ÑĢовиÑĩ
-0.14
_requires
-0.13
Limit
-0.13
POSITIVE LOGITS
separat
0.24
dev
0.19
autonom
0.18
se
0.18
翼
0.18
Separ
0.18
independence
0.18
autonomy
0.17
grievances
0.17
minority
0.16
Activations Density 0.131%