INDEX
    Explanations

    expressions of regret or apologies

    New Auto-Interp
    Negative Logits
    fman
    -0.88
    kefeller
    -0.85
    uminati
    -0.77
    apon
    -0.71
    ament
    -0.71
    ossession
    -0.70
    indal
    -0.69
    conservancy
    -0.68
    ificent
    -0.68
    ificantly
    -0.67
    POSITIVE LOGITS
    Sorry
    1.01
    sorry
    0.92
     Sorry
    0.89
    Invalid
    0.87
     Failed
    0.86
     sorry
    0.84
     unsupported
    0.80
     mistaken
    0.77
     miscar
    0.77
     :(
    0.77
    Act Density 0.040%

    No Known Activations