INDEX
    Explanations

    expressions related to self-harm and suicide

    New Auto-Interp
    Head Attr Weights
    0:0.02
    1:0.02
    2:0.09
    3:0.09
    4:0.02
    5:0.03
    6:0.05
    7:0.10
    8:0.16
    9:0.17
    10:0.05
    11:0.13
    Negative Logits
    taboola
    -1.32
    glers
    -1.27
     Presents
    -1.23
    ciplinary
    -1.22
    CLUD
    -1.15
    published
    -1.12
    lished
    -1.08
    spect
    -1.06
    ══
    -1.05
    Statement
    -1.03
    POSITIVE LOGITS
     chunk
    1.15
     unborn
    1.15
     invaders
    1.10
     ego
    1.08
     damn
    1.08
     dro
    1.06
     crap
    1.06
     shit
    1.05
     pesky
    1.04
     goddamn
    1.03
    Act Density 0.042%

    No Known Activations