INDEX
    Explanations

    references to safety events and practices in a professional context

    New Auto-Interp
    Negative Logits
    locator
    -0.17
    kowski
    -0.17
    pitch
    -0.16
    stin
    -0.15
    ÃľR
    -0.15
    obili
    -0.14
    angkan
    -0.14
     pitch
    -0.14
    ech
    -0.14
    695
    -0.14
    POSITIVE LOGITS
     safety
    0.35
     Safety
    0.34
    Safety
    0.31
     hazards
    0.26
     hazard
    0.26
     Unsafe
    0.25
     Hazard
    0.25
     unsafe
    0.25
    afety
    0.24
     safer
    0.23
    Act Density 0.040%

    No Known Activations