INDEX
    Explanations

    email-related words and phrases

    various forms of punctuation, specifically commas

    New Auto-Interp
    Negative Logits
    hest
    -0.81
    kinson
    -0.69
    gow
    -0.69
    ufact
    -0.66
    abal
    -0.66
    teenth
    -0.65
     fronts
    -0.61
    edom
    -0.60
    ulton
    -0.60
     Ballard
    -0.59
    POSITIVE LOGITS
     please
    0.80
     nor
    0.76
    please
    0.76
     Please
    0.75
    Sorry
    0.73
     Cancel
    0.69
    Please
    0.69
     PLEASE
    0.66
    ause
    0.63
     despite
    0.60
    Act Density 0.017%

    No Known Activations