INDEX
Explanations
terms and concepts related to human rights and their violations
New Auto-Interp
Negative Logits
hm
-0.15
Ậ
-0.14
hek
-0.14
imentos
-0.14
otts
-0.14
pró
-0.14
lemen
-0.13
ha
-0.13
acquaintance
-0.13
gebn
-0.13
POSITIVE LOGITS
violations
0.31
violation
0.30
viol
0.27
Viol
0.27
abuses
0.26
viol
0.23
defenders
0.23
abuse
0.23
defender
0.22
violated
0.22
Activations Density 0.026%