INDEX
    Explanations

    references to various types of threats

    New Auto-Interp
    Negative Logits
    urses
    -0.84
    ricks
    -0.83
    tein
    -0.73
    arist
    -0.71
    raham
    -0.68
    ools
    -0.68
    gian
    -0.68
     gown
    -0.67
    ributes
    -0.67
    uties
    -0.67
    POSITIVE LOGITS
     posed
    1.27
     threat
    0.99
     threats
    0.94
    threat
    0.87
     emanating
    0.82
     Threat
    0.81
     glare
    0.77
    lessly
    0.76
    xual
    0.75
    sov
    0.75
    Act Density 0.024%

    No Known Activations