INDEX
Explanations
references to suicidal actions or thoughts
references to suicidal individuals and suicidal behavior
New Auto-Interp
Negative Logits
apparel
-0.75
Rew
-0.71
Corn
-0.70
iture
-0.69
Renew
-0.69
orns
-0.68
itures
-0.66
Foss
-0.65
Vel
-0.65
Prov
-0.65
POSITIVE LOGITS
suicidal
2.65
autistic
2.11
delusional
1.96
bipolar
1.92
psychotic
1.89
paranoid
1.84
narcissistic
1.72
schizophren
1.62
sociop
1.59
icidal
1.54
Activations Density 0.053%