INDEX
Explanations
information related to human rights abuses and atrocities, particularly focusing on solitary confinement and violent acts
New Auto-Interp
Negative Logits
inventoryQuantity
-0.92
hindsight
-0.81
Prediction
-0.78
soType
-0.76
bullish
-0.76
Effective
-0.76
sense
-0.75
commenter
-0.74
Seller
-0.72
prediction
-0.72
POSITIVE LOGITS
raped
1.09
tortured
1.08
torture
1.03
raped
1.02
raping
0.98
deprived
0.98
forcibly
0.95
dehuman
0.95
deprivation
0.95
interrogated
0.95
Activations Density 0.385%