INDEX
    Explanations

    warnings or cautions expressed in texts

    warnings or cautions about potential risks

    New Auto-Interp
    Negative Logits
    cess
    -0.87
    fab
    -0.73
    arp
    -0.72
    rid
    -0.71
    ID
    -0.69
    Doctor
    -0.68
    mut
    -0.67
    func
    -0.67
    bernatorial
    -0.66
    ater
    -0.66
    POSITIVE LOGITS
     beware
    1.12
     flock
    0.94
     Beware
    0.86
     lest
    0.79
    rums
    0.78
    eware
    0.77
    theless
    0.74
     wary
    0.73
     heed
    0.71
    ashtra
    0.71
    Act Density 0.029%

    No Known Activations