INDEX
Explanations
objects typically associated with violence, specifically knives
mentions of knives in various contexts
New Auto-Interp
Negative Logits
mberg
-0.88
rian
-0.80
tainment
-0.76
organ
-0.75
rians
-0.74
phas
-0.73
ADRA
-0.73
rium
-0.72
oral
-0.72
oday
-0.72
POSITIVE LOGITS
knives
1.19
knife
1.11
knife
1.10
blades
1.04
scissors
1.02
blade
1.01
Knife
0.99
slicing
0.94
wielding
0.89
claws
0.87
Activations Density 0.018%