INDEX
    Explanations

    words related to threats or harm

    phrases indicating potential threats or harm to individuals or society

    New Auto-Interp
    Negative Logits
    soDeliveryDate
    -0.89
    ihad
    -0.71
    arten
    -0.70
    tions
    -0.67
    need
    -0.66
    anooga
    -0.63
    sponsored
    -0.62
    iHUD
    -0.61
     Cosponsors
    -0.60
    hent
    -0.60
    POSITIVE LOGITS
     injure
    0.86
    adies
    0.78
     ensure
    0.76
     compensate
    0.76
     avoid
    0.75
    asted
    0.75
     assist
    0.74
     enhance
    0.74
     reduce
    0.74
     achieve
    0.73
    Act Density 0.167%

    No Known Activations