INDEX
    Explanations

    requests or instructions for action

    instances of the word "please" in various contexts

    New Auto-Interp
    Negative Logits
    arc
    -0.78
    MpServer
    -0.78
    ARC
    -0.69
    é¾
    -0.67
    senal
    -0.67
    mund
    -0.64
    cler
    -0.64
    phrine
    -0.64
    UF
    -0.64
     Huntington
    -0.63
    POSITIVE LOGITS
     advise
    0.90
     excuse
    0.88
     fill
    0.88
     Ignore
    0.82
     ignore
    0.82
     note
    0.81
     sir
    0.81
     forgive
    0.79
     refrain
    0.79
     inquire
    0.78
    Act Density 0.015%

    No Known Activations