INDEX
Explanations
incidents involving violence and injury
New Auto-Interp
Negative Logits
Rough
-0.17
rough
-0.15
елем
-0.14
ouve
-0.14
statt
-0.14
Act
-0.14
cale
-0.14
statistical
-0.14
combe
-0.14
haps
-0.13
POSITIVE LOGITS
hit
0.21
shot
0.20
bullet
0.20
shots
0.19
hits
0.19
Hits
0.19
-hit
0.19
Shot
0.18
shot
0.17
bullet
0.17
Activations Density 0.072%