INDEX
    Explanations

    phrases indicating the user to take action

    phrases instructing the reader to take action or access information

    New Auto-Interp
    Negative Logits
    pard
    -0.67
     defe
    -0.60
     withd
    -0.57
     resc
    -0.55
     taboo
    -0.55
    handedly
    -0.54
     disson
    -0.54
     palate
    -0.54
     experiment
    -0.54
     lik
    -0.53
    POSITIVE LOGITS
     rid
    1.21
    TING
    1.14
    cloneembedreportprint
    0.94
    away
    0.92
    aways
    0.89
     Started
    0.81
     Tickets
    0.79
     Rid
    0.78
     notified
    0.77
     acquainted
    0.76
    Act Density 0.041%

    No Known Activations