INDEX
Explanations
mentions of human rights violations and their consequences
New Auto-Interp
Negative Logits
.addHandler
-0.16
ekl
-0.15
ogg
-0.15
ánh
-0.14
oci
-0.14
ÑĨин
-0.13
Hoch
-0.13
непÑĢиÑıÑĤ
-0.13
obra
-0.13
اÙĨÙĬ
-0.13
POSITIVE LOGITS
arbitrary
0.34
extr
0.31
summary
0.31
Arbitrary
0.28
disappear
0.28
torture
0.27
Extr
0.26
extra
0.26
summary
0.25
Summary
0.24
Activations Density 0.040%