INDEX
Explanations
references to human rights
references to human rights issues
New Auto-Interp
Negative Logits
Arc
-0.79
BALL
-0.78
STRUCT
-0.71
Milky
-0.67
Suggest
-0.67
ALSE
-0.66
ERSON
-0.66
HEAD
-0.65
Ballard
-0.64
Magnetic
-0.64
POSITIVE LOGITS
rights
1.10
abuses
1.08
rights
1.02
nesty
0.93
activists
0.90
ktop
0.89
Rights
0.88
violations
0.85
lawyers
0.81
protections
0.81
Activations Density 0.029%