INDEX
Explanations
terms related to suicide and self-harm
New Auto-Interp
Negative Logits
Personendaten
-0.57
rö
-0.49
dinosau
-0.41
dino
-0.41
hou
-0.40
Ladybug
-0.40
Yandex
-0.39
dyn
-0.39
Beur
-0.39
CodeDom
-0.39
POSITIVE LOGITS
suicide
1.32
suicide
1.17
Suicide
1.11
suicidio
1.02
Suicide
1.01
suicides
1.00
suicidal
0.97
自杀
0.90
自殺
0.85
suic
0.79
Activations Density 0.284%