INDEX
Explanations
mention of the word "torture"
terms related to torture
New Auto-Interp
Negative Logits
Darkness
-0.78
magnification
-0.75
Prospect
-0.69
ACP
-0.67
Silk
-0.66
ECB
-0.65
donor
-0.64
brightest
-0.64
Flavoring
-0.63
lihood
-0.63
POSITIVE LOGITS
urous
1.45
oise
1.38
uous
1.06
uring
1.04
illas
1.01
ured
0.98
eur
0.98
oled
0.96
imer
0.94
ificate
0.93
Activations Density 0.012%