INDEX
Explanations
instances of violent actions, particularly related to shootings
New Auto-Interp
Negative Logits
rove
-0.15
urer
-0.15
uke
-0.14
contra
-0.14
onium
-0.14
Cog
-0.14
ihat
-0.13
crement
-0.13
_DOT
-0.13
ÑĥÑĢи
-0.13
POSITIVE LOGITS
سرد
0.17
еÑģÑı
0.15
BAL
0.15
ucks
0.14
725
0.14
ModelProperty
0.14
/INFO
0.13
FirstChild
0.13
abbo
0.13
SERIAL
0.13
Activations Density 0.013%