INDEX
    Explanations

    instances where the phrase "don't" is used

    expressions of uncertainty or refusal

    New Auto-Interp
    Negative Logits
    ategory
    -0.75
     afore
    -0.73
    artney
    -0.72
     ANG
    -0.69
     Agency
    -0.66
     Passage
    -0.62
     Anim
    -0.62
    upp
    -0.59
     Personality
    -0.58
     Antar
    -0.58
    POSITIVE LOGITS
    't
    1.31
    ned
    0.89
    uts
    0.83
    ates
    0.79
    ÃŃ
    0.78
    anted
    0.75
    nas
    0.71
    kie
    0.70
    na
    0.69
    itzer
    0.69
    Act Density 0.075%

    No Known Activations