INDEX
    Explanations

    contractions and possessives

    New Auto-Interp
    Negative Logits
     '
    0.89
    '
    0.75
     "
    0.67
    '-
    0.66
     '-
    0.64
    ',
    0.63
    0.63
     '-'
    0.60
    ','
    0.57
    '?
    0.55
    POSITIVE LOGITS
     “‘
    0.72
    0.70
     (’
    0.68
    =’
    0.61
     (‘
    0.61
    ’’
    0.59
    )’
    0.59
    .’
    0.57
     ‘’
    0.57
     (“
    0.56
    Act Density 0.002%

    No Known Activations