INDEX
    Explanations

    AI refusal or disclaimer

    New Auto-Interp
    Negative Logits
     dig
    0.42
     cs
    0.38
    ছা
    0.36
     ssl
    0.35
     backwards
    0.35
     stack
    0.34
     damned
    0.34
    ちゃんと
    0.34
     pegs
    0.34
     esper
    0.34
    POSITIVE LOGITS
    Sorry
    0.60
    sorry
    0.58
    Disclaimer
    0.57
    This
    0.56
    Dear
    0.56
     Sorry
    0.56
     Disclaimer
    0.52
    मैं
    0.51
     cautioned
    0.51
     sorry
    0.49
    Act Density 0.013%

    No Known Activations