INDEX
    Explanations

    repeated use of the apostrophe

    New Auto-Interp
    Negative Logits
    "
    -0.31
    )
    -0.30
    '
    -0.30
    :
    -0.29
    .
    -0.29
    ,
    -0.25
    -0.22
    )↵
    -0.22
    -
    -0.22
    ]
    -0.21
    POSITIVE LOGITS
    /'
    0.26
    ...'
    0.23
    -'
    0.22
    .'
    0.21
    .'.
    0.21
    ÂĢÂĻ
    0.20
    --
    0.19
     '.
    0.18
    0.18
    ..'
    0.18
    Act Density 0.085%

    No Known Activations