INDEX
Explanations
mentions of the word "torture"
references to torture
New Auto-Interp
Negative Logits
Darkness
-0.78
magnification
-0.75
donor
-0.68
lihood
-0.66
Farn
-0.65
Prospect
-0.64
brightest
-0.64
âĸ¬
-0.63
Manhattan
-0.63
FORE
-0.62
POSITIVE LOGITS
urous
1.43
oise
1.34
illas
1.08
uring
1.03
uous
1.01
ured
1.01
urers
0.98
eur
0.96
imer
0.94
ures
0.94
Activations Density 0.005%