INDEX
Explanations
mentions of physical actions or incidents, particularly those involving physical altercations or force
New Auto-Interp
Negative Logits
ONSORED
-0.83
Reviewer
-0.82
)=(
-0.79
theless
-0.72
Dispatch
-0.68
mus
-0.68
ulative
-0.66
simultane
-0.66
DOC
-0.66
Handling
-0.64
POSITIVE LOGITS
omsday
1.26
herty
1.22
ppel
1.21
gging
1.07
ctr
1.02
lez
1.00
oms
0.96
ozy
0.95
ctors
0.95
pez
0.94
Activations Density 0.041%