INDEX
Explanations
references to violent events and firearm usage
New Auto-Interp
Negative Logits
orners
-0.16
xit
-0.15
bruises
-0.14
åĭ
-0.14
685
-0.14
arsed
-0.14
imension
-0.14
ãĥ¢ãĥ³
-0.13
inflate
-0.13
owski
-0.13
POSITIVE LOGITS
fire
0.34
firing
0.34
fired
0.31
-fire
0.31
fire
0.27
gunfire
0.26
Fire
0.25
shots
0.25
_fire
0.24
Fire
0.24
Activations Density 0.088%