INDEX
    Explanations

    descriptions or mentions of injuries

    mentions of injuries or harm caused to individuals

    New Auto-Interp
    Negative Logits
    gency
    -0.79
    gran
    -0.71
    perm
    -0.70
    minist
    -0.69
    ingen
    -0.65
     ellipt
    -0.63
     algorithm
    -0.62
    arten
    -0.62
    SpaceEngineers
    -0.61
    ramid
    -0.61
    POSITIVE LOGITS
    jured
    0.88
     survivors
    0.85
     Survivors
    0.81
     injured
    0.80
     victims
    0.79
     bystanders
    0.78
    adoes
    0.76
     wounded
    0.76
     injuring
    0.75
    ../
    0.75
    Act Density 0.022%

    No Known Activations