INDEX
    Explanations

    contractions ending in "n't"

    New Auto-Interp
    Negative Logits
     wouldn
    -1.23
     didn
    -1.19
     couldn
    -1.16
    didn
    -1.04
     wasn
    -1.03
    wouldn
    -0.97
    couldn
    -0.95
     doesn
    -0.94
     Couldn
    -0.93
     Didn
    -0.92
    POSITIVE LOGITS
    '
    1.04
    0.94
    ʼ
    0.71
    0.69
    `
    0.66
    ʻ
    0.62
    ´
    0.59
    0.59
    \'
    0.58
    <bos>
    0.57
    Act Density 0.079%

    No Known Activations