INDEX
Explanations
occurrences of the word "torture" or its variations
references to torture
New Auto-Interp
Negative Logits
lihood
-0.77
Darkness
-0.73
magnification
-0.73
Prospect
-0.70
IST
-0.68
donor
-0.67
brightest
-0.65
Farn
-0.65
è£ħ
-0.64
livest
-0.62
POSITIVE LOGITS
urous
1.44
oise
1.40
illas
1.14
eur
1.14
ured
1.06
uous
1.03
urers
1.03
imer
1.02
uring
0.99
oled
0.99
Activations Density 0.021%