INDEX
    Explanations

    phrases indicating negligence and its consequences, particularly in the context of safety and liability

    New Auto-Interp
    Negative Logits
    çĬ¯
    -0.18
    272
    -0.17
    æĹı
    -0.15
    Trou
    -0.15
    ardon
    -0.15
     Trou
    -0.14
    WithError
    -0.14
    uffers
    -0.14
    ounters
    -0.13
    linger
    -0.13
    POSITIVE LOGITS
     loss
    0.18
     outright
    0.18
     lack
    0.16
     lost
    0.16
     bad
    0.16
     downright
    0.16
    loss
    0.16
     threats
    0.14
     missing
    0.14
    Ñģли
    0.14
    Act Density 0.387%

    No Known Activations