INDEX
Explanations
references to human rights issues and related international relations
New Auto-Interp
Negative Logits
Lebanese
-0.16
.mvc
-0.15
>\<^
-0.15
รà¸Ķ
-0.15
aes
-0.15
agua
-0.14
ocop
-0.14
aney
-0.14
Maharashtra
-0.14
colomb
-0.14
POSITIVE LOGITS
Xin
0.32
Uy
0.29
Turk
0.27
Han
0.25
Kash
0.24
Kaz
0.23
U
0.23
Autonomous
0.22
minority
0.22
Tur
0.21
Activations Density 0.010%