INDEX
Explanations
mental health and suicide terms
New Auto-Interp
Negative Logits
byte
0.49
gr
0.48
atility
0.46
data
0.45
ard
0.45
าล
0.45
производи
0.44
ICLE
0.43
end
0.42
new
0.42
POSITIVE LOGITS
suicidal
1.29
depression
1.27
depresión
1.20
suicide
1.19
depressive
1.16
Suicide
1.16
suic
1.13
psychiatric
1.11
Depression
1.10
psychiat
1.09
Activations Density 0.440%