INDEX
Explanations
words related to criminal activities such as mugging and assault
references to mugging incidents or related violent acts
New Auto-Interp
Negative Logits
×Ļ×
-0.75
Domin
-0.71
edient
-0.71
IGH
-0.70
Doctrine
-0.69
ipher
-0.68
peak
-0.68
ISION
-0.68
cision
-0.66
Virgin
-0.66
POSITIVE LOGITS
mug
1.17
gers
1.10
ging
1.00
shots
0.99
ger
0.96
shot
0.94
atures
0.87
ged
0.86
Mug
0.82
glers
0.81
Activations Density 0.007%