INDEX
    Explanations

    apologies or expressions of regret

    instances of the word "sorry" or variations of it

    New Auto-Interp
    Negative Logits
    tnc
    -0.74
    rouse
    -0.71
    krit
    -0.68
    arnaev
    -0.67
    irrel
    -0.67
    eele
    -0.67
    tein
    -0.67
    Ranked
    -0.66
    minecraft
    -0.64
     helicop
    -0.64
    POSITIVE LOGITS
     sorry
    1.11
     excuse
    0.89
    Sorry
    0.87
    sorry
    0.86
     Sorry
    0.84
    GES
    0.83
    giving
    0.78
     apologies
    0.75
     guys
    0.74
     pardon
    0.71
    Act Density 0.012%

    No Known Activations