INDEX
Explanations
concepts related to human rights
mentions of human rights
New Auto-Interp
Negative Logits
HEAD
-0.79
URRENT
-0.75
-+-+-+-+
-0.73
-+-+
-0.71
BALL
-0.70
Plains
-0.66
STRUCT
-0.66
AST
-0.65
````
-0.64
ALSE
-0.64
POSITIVE LOGITS
rights
1.35
rights
1.35
Rights
1.16
abuses
1.06
ktop
1.00
protections
0.89
tarians
0.86
yright
0.86
equality
0.85
distribut
0.85
Activations Density 0.020%