INDEX
Explanations
references to violence and brutality, particularly in relation to historical events
New Auto-Interp
Negative Logits
Mint
-0.16
378
-0.15
ços
-0.14
åĽº
-0.13
çĹ
-0.13
ÙĨØ´
-0.13
enco
-0.13
ovel
-0.13
379
-0.13
ì¢
-0.13
POSITIVE LOGITS
dec
0.31
severed
0.29
cuts
0.28
åĪĩ
0.27
dissect
0.25
cut
0.25
ÙĤطع
0.25
hacked
0.24
mutil
0.24
sever
0.24
Activations Density 0.228%