INDEX
Explanations
words related to physical torture and abuse
references to torture and related activities
New Auto-Interp
Negative Logits
soType
-0.84
ership
-0.75
explan
-0.65
iod
-0.64
lender
-0.64
nect
-0.64
ijk
-0.63
arger
-0.62
ouver
-0.61
soDeliveryDate
-0.61
POSITIVE LOGITS
torture
0.91
imony
0.79
urous
0.78
tortured
0.74
rs
0.72
captives
0.72
APE
0.71
ABE
0.70
rers
0.70
confinement
0.67
Activations Density 0.025%