INDEX
    Explanations

    themes of suicide and self-harm

    New Auto-Interp
    Negative Logits
    exus
    -0.18
    urch
    -0.15
    agina
    -0.15
     Sponsored
    -0.14
    .learn
    -0.14
    iert
    -0.13
     sponsored
    -0.13
     spons
    -0.13
    trap
    -0.13
    acle
    -0.13
    POSITIVE LOGITS
     suicide
    0.64
     Suicide
    0.56
    su
    0.56
     suicides
    0.54
    Su
    0.54
    -su
    0.54
     Su
    0.52
     commit
    0.52
    _su
    0.50
     suic
    0.49
    Act Density 0.175%

    No Known Activations