INDEX
Explanations
references to police actions and the use of force
New Auto-Interp
Negative Logits
å·»
-0.15
гов
-0.15
çĿĽ
-0.14
loff
-0.14
amage
-0.14
hů
-0.14
ermint
-0.14
_APB
-0.13
ieux
-0.13
Wish
-0.13
POSITIVE LOGITS
pepper
0.33
TAS
0.33
Tas
0.32
tas
0.30
Pepper
0.29
bat
0.28
tas
0.28
stun
0.27
tear
0.23
Tear
0.23
Activations Density 0.041%