INDEX
    Explanations

    phrases or words related to unsafe situations or actions

    references to safety and the concept of being unsafe

    New Auto-Interp
    Negative Logits
    orah
    -0.85
    cence
    -0.84
    phasis
    -0.81
    ership
    -0.80
    braska
    -0.80
    bernatorial
    -0.80
    gdala
    -0.80
    ophers
    -0.80
    zzo
    -0.80
    thood
    -0.76
    POSITIVE LOGITS
     unsafe
    1.04
     adolesc
    0.78
    nesses
    0.72
     unle
    0.67
    NESS
    0.67
     hazardous
    0.66
    ÃįÃį
    0.62
     Ukrain
    0.62
     unhealthy
    0.61
    IED
    0.60
    Act Density 0.016%

    No Known Activations