INDEX
Explanations
graphic depictions of violence and brutality
New Auto-Interp
Negative Logits
ipa
-0.20
rouch
-0.17
Dangerous
-0.16
exhaust
-0.15
ãĤĥ
-0.14
temper
-0.13
anes
-0.13
anik
-0.13
378
-0.13
ption
-0.13
POSITIVE LOGITS
organs
0.26
dec
0.24
gore
0.23
severed
0.23
entr
0.22
limbs
0.22
dis
0.21
hacked
0.21
киÑĪ
0.20
/org
0.20
Activations Density 0.163%