INDEX
    Explanations

    the word "don't"

    the presence of the contraction "don't."

    New Auto-Interp
    Negative Logits
    itiz
    -0.70
     Penguin
    -0.66
     Presence
    -0.66
     spirited
    -0.63
     Reloaded
    -0.61
     Alleg
    -0.61
     Sparrow
    -0.60
    Reviewer
    -0.59
     Laun
    -0.58
    çĦ
    -0.58
    POSITIVE LOGITS
     necessarily
    1.19
     bother
    1.07
     know
    1.04
     hesitate
    1.01
     discriminate
    0.99
     deserve
    0.98
     appreciate
    0.96
     seem
    0.95
    intend
    0.93
     belong
    0.93
    Act Density 0.097%

    No Known Activations