INDEX
    Explanations

    explicit mentions of severe or extreme situations or conditions

    words related to serious and urgent situations

    New Auto-Interp
    Negative Logits
    adesh
    -0.81
    obbies
    -0.81
    nesota
    -0.74
    andise
    -0.74
    ACP
    -0.74
    adding
    -0.74
    adr
    -0.71
    orthy
    -0.70
    onew
    -0.69
    ipop
    -0.68
    POSITIVE LOGITS
     dire
    0.90
    ly
    0.89
    gency
    0.82
     consequences
    0.81
    bly
    0.78
     earthqu
    0.76
    wolf
    0.76
    wolves
    0.74
    LY
    0.74
     predic
    0.74
    Act Density 0.011%

    No Known Activations