INDEX
    Explanations

    punctuation marks, specifically parentheses

    New Auto-Interp
    Negative Logits
     {}'.
    -0.78
    ValueStyle
    -0.73
    <bos>
    -0.72
    {}'.
    -0.70
     ';
    
    -0.69
    [];
    
    -0.63
    `,
    
    -0.62
    <?,
    -0.62
     question
    -0.62
     }}</
    -0.61
    POSITIVE LOGITS
     ("
    2.38
     (“
    2.29
    (“
    2.24
    ("
    2.13
     („
    1.87
     ('
    1.81
     (‘
    1.78
     («
    1.72
    ('
    1.67
    (‘
    1.63
    Act Density 0.075%

    No Known Activations