INDEX
    Explanations

    phrases related to actions of self-harm or harm towards others

    references to self-harm and suicide

    New Auto-Interp
    Negative Logits
    taboola
    -0.86
    soType
    -0.77
    eret
    -0.74
    Management
    -0.72
    Wide
    -0.72
    å§«
    -0.71
    issance
    -0.71
    DragonMagazine
    -0.70
    UMP
    -0.69
    FML
    -0.68
    POSITIVE LOGITS
     unborn
    0.90
     whales
    0.75
     innocent
    0.74
     hated
    0.74
     terrorists
    0.73
     classmate
    0.73
     Mum
    0.71
     chickens
    0.71
     birds
    0.70
     zombies
    0.70
    Act Density 0.183%

    No Known Activations