INDEX
    Explanations

    terms related to suicide and self-harm

    New Auto-Interp
    Negative Logits
    anzi
    -0.19
    ycz
    -0.16
     Dud
    -0.14
    éĢł
    -0.14
    æ°¸ä¹ħ
    -0.14
     Criminal
    -0.13
    ValuePair
    -0.13
    اØŃØ©
    -0.13
    ye
    -0.13
    ุà¸ķ
    -0.13
    POSITIVE LOGITS
    /self
    0.20
    alex
    0.15
    apas
    0.15
     dokon
    0.15
     Ordered
    0.14
    uars
    0.14
     Liver
    0.14
    æ½®
    0.14
    eros
    0.14
    ãĥ¶
    0.14
    Act Density 0.013%

    No Known Activations