INDEX
    Explanations

    requests or instructions in the text

    New Auto-Interp
    Negative Logits
    pires
    -0.76
    lings
    -0.71
    é¾
    -0.70
    visor
    -0.69
    IUM
    -0.69
    arc
    -0.69
    cler
    -0.69
    laus
    -0.66
    MpServer
    -0.65
    bent
    -0.65
    POSITIVE LOGITS
     Ignore
    1.00
     forgive
    0.96
     beware
    0.96
     note
    0.95
     excuse
    0.95
     enable
    0.90
     ignore
    0.90
     refrain
    0.89
     advise
    0.89
     disregard
    0.88
    Act Density 0.433%

    No Known Activations