INDEX
Explanations
violent and graphic descriptions involving bodily harm
graphic descriptions of violence and injuries
New Auto-Interp
Negative Logits
ivals
-0.92
anguage
-0.72
vae
-0.69
Families
-0.69
Franch
-0.68
agents
-0.67
levard
-0.67
Bus
-0.67
Reign
-0.65
Pillar
-0.65
POSITIVE LOGITS
protr
0.97
hers
0.93
amput
0.91
stretched
0.90
swollen
0.90
abnorm
0.86
invol
0.84
clenched
0.82
numb
0.80
Shutterstock
0.79
Activations Density 0.268%