INDEX
Explanations
phrases and terms related to human rights violations and organizations working to protect human rights
New Auto-Interp
Negative Logits
agers
-0.82
opic
-0.71
ancest
-0.69
driver
-0.66
lass
-0.64
adal
-0.64
age
-0.61
pox
-0.61
stead
-0.61
fries
-0.61
POSITIVE LOGITS
nesty
1.01
International
0.87
undo
0.75
International
0.73
ãĤ±
0.67
Machina
0.66
Chomsky
0.66
endi
0.64
Choice
0.63
ileaks
0.63
Activations Density 0.021%