INDEX
Explanations
references to violent crimes and the individuals involved in them
New Auto-Interp
Negative Logits
ipp
-0.15
essler
-0.15
violence
-0.14
idable
-0.14
viol
-0.14
ãĥ¼ãĥ«ãĥī
-0.14
bey
-0.13
eza
-0.13
-0.13
.toBe
-0.13
POSITIVE LOGITS
Wich
0.19
Pvt
0.18
pong
0.17
.cf
0.17
ÑĢок
0.16
Mgr
0.15
undles
0.15
วรร
0.15
ritch
0.14
apsed
0.14
Activations Density 0.248%