INDEX
Explanations
violent and medical-related concepts, including blood, cancer, and darkness
references to violence and death
New Auto-Interp
Negative Logits
elfth
-0.69
hander
-0.67
Dispatch
-0.66
oday
-0.66
Publication
-0.66
ASA
-0.64
ACTED
-0.64
encer
-0.64
Participant
-0.63
peer
-0.62
POSITIVE LOGITS
feces
0.81
rubble
0.80
dylib
0.78
goodies
0.74
vomit
0.74
flo
0.69
indistinguishable
0.69
garbage
0.68
metaphors
0.68
unimaginable
0.67
Activations Density 0.409%