INDEX
    Explanations

    phrases requesting actions or conveying politeness

    requests for action or feedback

    New Auto-Interp
    Negative Logits
    arc
    -0.77
    ARC
    -0.76
    advertisement
    -0.73
    ription
    -0.69
    senal
    -0.68
    existence
    -0.67
    ula
    -0.64
    phrine
    -0.63
    mund
    -0.61
    bang
    -0.60
    POSITIVE LOGITS
     sir
    0.92
     Ignore
    0.80
     fill
    0.78
     pardon
    0.77
     ignore
    0.76
     inquire
    0.74
     excuse
    0.74
     forgive
    0.73
     beware
    0.72
    ignore
    0.71
    Act Density 0.013%

    No Known Activations