INDEX
    Explanations

    contractions of "do not", especially with high importance on instances where the contraction "don't" is used

    negations or forms of the word "don't."

    New Auto-Interp
    Negative Logits
     reluct
    -0.97
     newcom
    -0.95
     exha
    -0.93
     enthusi
    -0.93
    Þ
    -0.91
     pione
    -0.91
    aditional
    -0.88
     challeng
    -0.88
     princ
    -0.86
     conclud
    -0.85
    POSITIVE LOGITS
    't
    1.62
    ned
    1.18
    ning
    1.03
    ates
    0.93
    uts
    0.92
    keys
    0.84
    ate
    0.84
    ´
    0.83
    \'
    0.82
    ners
    0.81
    Act Density 0.113%

    No Known Activations