INDEX
    Explanations

    punctuation marks, particularly single quotes and parentheses

    Single quotation marks

    New Auto-Interp
    Negative Logits
     "
    -1.06
     ("
    -0.90
     “
    -0.88
    ]["
    -0.82
    {}".
    -0.79
    ("
    -0.79
    /"
    -0.78
    >("
    -0.76
    -"
    -0.75
     "";
    -0.74
    POSITIVE LOGITS
     ‘
    1.46
    、『
    1.35
     '
    1.30
    ...'
    1.29
    …’
    1.24
    |'
    1.20
     (‘
    1.16
    。『
    1.14
    1.13
    =’
    1.13
    Act Density 0.170%

    No Known Activations