INDEX
    Explanations

    phrases requesting human verification

    requests or prompts for user actions

    New Auto-Interp
    Negative Logits
    pires
    -0.79
    laus
    -0.72
    rist
    -0.67
    pher
    -0.67
    itive
    -0.67
    borgh
    -0.65
    visor
    -0.65
    kept
    -0.63
    amus
    -0.62
    bent
    -0.62
    POSITIVE LOGITS
     verify
    0.91
     Subscribe
    0.81
     email
    0.79
     enter
    0.76
     enable
    0.76
     Ignore
    0.76
     login
    0.76
    contact
    0.74
     disregard
    0.73
     inquire
    0.73
    Act Density 0.011%

    No Known Activations