INDEX
Explanations
references to violent incidents, specifically shootings
references to incidents of gun violence
New Auto-Interp
Negative Logits
hw
-0.82
Label
-0.79
undai
-0.74
ebook
-0.73
Yok
-0.72
ateg
-0.71
akuya
-0.70
MpServer
-0.70
GY
-0.70
ulla
-0.69
POSITIVE LOGITS
shooting
1.03
spree
1.01
shoot
0.93
Shooting
0.84
powder
0.82
nikov
0.80
shoots
0.80
Shoot
0.79
gallery
0.78
fireworks
0.77
Activations Density 0.016%