INDEX
    Explanations

    phrases indicating instructions or requests in a polite manner

    requests or prompts for action

    New Auto-Interp
    Negative Logits
    arthed
    -0.75
    pires
    -0.73
    Tok
    -0.68
    76561
    -0.68
    Aust
    -0.67
    senal
    -0.67
    MpServer
    -0.67
    arc
    -0.67
    é¾
    -0.67
    byss
    -0.66
    POSITIVE LOGITS
     note
    1.16
     refrain
    1.09
     forgive
    1.08
     Ignore
    1.07
     consider
    1.07
     enable
    1.07
     excuse
    1.01
     beware
    1.00
     refer
    0.99
     disregard
    0.99
    Act Density 0.031%

    No Known Activations