INDEX
    Explanations

    the word "don't."

    negative contractions, particularly "don't"

    New Auto-Interp
    Negative Logits
     afore
    -0.67
     vanquished
    -0.62
     Species
    -0.59
     ejected
    -0.59
     VERS
    -0.58
    ipel
    -0.57
     Calls
    -0.57
     elimination
    -0.57
     Completed
    -0.57
     Casting
    -0.56
    POSITIVE LOGITS
    't
    1.57
    ned
    1.23
    ates
    0.98
    ning
    0.93
    atives
    0.87
    uts
    0.86
    nell
    0.84
    ate
    0.84
    nels
    0.83
    etsk
    0.83
    Act Density 0.132%

    No Known Activations