INDEX
    Explanations

    phrases and words related to dangers and potential harm

    New Auto-Interp
    Negative Logits
    uinal
    -0.67
    inyin
    -0.63
    Absorption
    -0.62
    idase
    -0.62
     scolas
    -0.61
    {~
    -0.61
    ertos
    -0.60
    AILED
    -0.60
     ayrı
    -0.59
    ombus
    -0.59
    POSITIVE LOGITS
     threat
    1.85
    threat
    1.80
     Threat
    1.75
     threats
    1.75
     Threats
    1.69
    Threat
    1.68
     threatened
    1.50
    Threats
    1.47
     threatens
    1.47
     threaten
    1.40
    Act Density 0.040%

    No Known Activations