INDEX
    Explanations

    apologies or laments

    instances of the word "sorry."

    New Auto-Interp
    Negative Logits
     helicop
    -0.69
    ccording
    -0.68
    rower
    -0.67
    krit
    -0.67
    lav
    -0.66
     holistic
    -0.65
    tein
    -0.65
    minecraft
    -0.64
    vet
    -0.64
    ief
    -0.64
    POSITIVE LOGITS
     sorry
    1.06
     Sorry
    0.91
    sorry
    0.87
    Sorry
    0.85
     excuse
    0.82
     pardon
    0.79
    GES
    0.75
     Guys
    0.72
     Ladies
    0.70
     apologies
    0.69
    Act Density 0.009%

    No Known Activations