INDEX
Explanations
instances of violence or gun-related incidents
New Auto-Interp
Negative Logits
arbeit
-0.17
rete
-0.16
ptime
-0.16
ameleon
-0.16
_FAULT
-0.15
imits
-0.15
tridge
-0.15
regn
-0.15
ÑĦÑĤ
-0.15
ionage
-0.14
POSITIVE LOGITS
OLE
0.16
Patron
0.15
Manning
0.15
patron
0.14
OCI
0.14
903
0.14
ray
0.14
appId
0.14
cky
0.14
Gad
0.14
Activations Density 0.041%