INDEX
Explanations
references to human rights organizations, specifically "HRW (Human Rights Watch)"
references to human rights organizations and their activities
New Auto-Interp
Negative Logits
lihood
-0.88
struct
-0.79
cart
-0.74
eu
-0.69
ç¥ŀ
-0.69
cence
-0.68
Wand
-0.67
sta
-0.67
eers
-0.67
Tasman
-0.67
POSITIVE LOGITS
RR
1.03
senal
1.03
utherford
0.93
Ds
0.91
VO
0.91
ANGE
0.87
ands
0.87
APH
0.87
anges
0.85
andom
0.84
Activations Density 0.044%