INDEX
    Explanations

    phrases requesting user action or providing instructions

    requests for user input or confirmation

    New Auto-Interp
    Negative Logits
    itive
    -0.61
    driving
    -0.55
    laus
    -0.54
    words
    -0.52
     academ
    -0.52
     mole
    -0.51
    amus
    -0.51
    ulz
    -0.50
    tro
    -0.49
     Barker
    -0.49
    POSITIVE LOGITS
     Cancel
    0.85
     verify
    0.70
     refresh
    0.69
     login
    0.64
     Subscribe
    0.63
     try
    0.63
     subscribe
    0.62
     Refresh
    0.62
     reuse
    0.62
     inbox
    0.62
    Act Density 0.011%

    No Known Activations