INDEX
Explanations
phrases related to physical harm or violence
terms related to bodily harm and injuries
New Auto-Interp
Negative Logits
gered
-0.74
rieg
-0.73
olini
-0.71
owitz
-0.69
Clover
-0.69
kers
-0.67
âϦ
-0.65
night
-0.65
effective
-0.64
Kafka
-0.63
POSITIVE LOGITS
fluids
0.97
bodily
0.91
puter
0.85
injury
0.79
incorpor
0.78
organs
0.77
tissues
0.76
hesda
0.76
tradem
0.75
anatomy
0.75
Activations Density 0.006%