INDEX
Explanations
references to the organization Amnesty International
references to Amnesty International
New Auto-Interp
Negative Logits
opic
-0.73
agers
-0.71
adal
-0.66
caffe
-0.64
ancest
-0.64
driver
-0.62
MTA
-0.62
impro
-0.62
haul
-0.62
gradient
-0.61
POSITIVE LOGITS
nesty
1.21
Amnesty
0.93
International
0.80
Chomsky
0.76
Choice
0.75
endi
0.73
Rights
0.72
International
0.71
ij士
0.71
ileaks
0.70
Activations Density 0.018%