INDEX
Explanations
references to suicide and suicidal behavior
New Auto-Interp
Negative Logits
suicide
-1.95
suicide
-1.74
Suicide
-1.73
Suicide
-1.55
suicides
-1.47
suicidio
-1.20
suicidal
-1.16
自殺
-1.12
自杀
-1.08
suic
-1.03
POSITIVE LOGITS
难
0.34
Failing
0.34
Failing
0.34
sore
0.33
hot
0.32
jardin
0.32
zah
0.31
table
0.31
pio
0.31
hot
0.31
Activations Density 0.002%