INDEX
    Explanations

    words related to danger or potential harm

    references to danger or harmful situations

    New Auto-Interp
    Negative Logits
    issance
    -0.81
    ļéĨĴ
    -0.78
    galitarian
    -0.73
    guyen
    -0.73
    ergy
    -0.71
    owned
    -0.71
    Ħ¢
    -0.70
    elle
    -0.70
    urally
    -0.69
    eenth
    -0.68
    POSITIVE LOGITS
    danger
    0.95
     Danger
    0.91
     endanger
    0.82
     danger
    0.82
    ously
    0.82
     dangers
    0.81
     lur
    0.81
     lurking
    0.78
     dangerous
    0.74
     peril
    0.73
    Act Density 0.054%

    No Known Activations