INDEX
Explanations
words related to torture and suffering
New Auto-Interp
Negative Logits
Hitman
-0.70
donor
-0.67
··
-0.64
itably
-0.64
lihood
-0.62
ity
-0.62
Hung
-0.59
Conservative
-0.57
Leone
-0.57
assumption
-0.57
POSITIVE LOGITS
urous
1.64
urers
1.46
uring
1.46
ures
1.44
ured
1.38
orial
1.32
oise
1.21
urable
1.21
urer
1.19
eur
1.09
Activations Density 0.162%