INDEX
    Explanations

    apologies or expressions of regret

    instances of apology or expressions of regret

    New Auto-Interp
    Negative Logits
    Ranked
    -0.79
    tnc
    -0.74
    iltration
    -0.71
    arnaev
    -0.71
    irrel
    -0.69
    eele
    -0.68
    edience
    -0.68
    rouse
    -0.68
    psey
    -0.67
    minecraft
    -0.67
    POSITIVE LOGITS
     sorry
    1.08
     excuse
    0.92
    GES
    0.85
    sorry
    0.85
    Sorry
    0.80
     Sorry
    0.74
    giving
    0.73
     apologies
    0.72
    tm
    0.72
     Customers
    0.69
    Act Density 0.014%

    No Known Activations