INDEX
    Explanations

    requests for action or assistance

    instances of the word "please."

    New Auto-Interp
    Negative Logits
    ARC
    -0.79
    arc
    -0.72
    advertisement
    -0.69
    senal
    -0.66
    existence
    -0.65
    ription
    -0.64
     ãĤµãĥ¼ãĥĨãĤ£ãĥ¯ãĥ³
    -0.64
    perty
    -0.62
    alone
    -0.62
    é¾
    -0.61
    POSITIVE LOGITS
     sir
    0.90
    lyak
    0.78
     pardon
    0.77
     please
    0.77
     note
    0.75
     kindly
    0.74
     excuse
    0.74
     fill
    0.73
     Ignore
    0.73
     Please
    0.73
    Act Density 0.013%

    No Known Activations