INDEX
Explanations
words related to physical injury, particularly stabbings
references to stabbings or knife-related injuries
New Auto-Interp
Negative Logits
gart
-0.85
zeb
-0.84
anamo
-0.76
ayson
-0.75
amina
-0.72
usk
-0.70
cules
-0.68
ained
-0.68
oses
-0.66
umen
-0.65
POSITIVE LOGITS
lished
0.91
ritch
0.74
rers
0.74
ival
0.73
¾
0.71
¼
0.71
nesday
0.70
hold
0.70
ters
0.69
TERN
0.68
Activations Density 0.105%