INDEX
    Explanations

    meta-instructions or requests emphasizing actions that should or should not be taken

    instructions and requests for compliance in online interactions

    New Auto-Interp
    Negative Logits
    azeera
    -0.71
    Ö¼
    -0.71
    MpServer
    -0.70
    ilogy
    -0.68
    ufact
    -0.64
    culosis
    -0.63
    ailability
    -0.63
    aturated
    -0.60
    ãĥĬ
    -0.60
    quickShipAvailable
    -0.59
    POSITIVE LOGITS
     responsibly
    0.70
     sir
    0.68
    iquette
    0.67
     sacrific
    0.66
     Submit
    0.65
     politely
    0.65
     PLEASE
    0.64
     Refresh
    0.64
     reprint
    0.63
     yourselves
    0.62
    Act Density 0.083%

    No Known Activations