INDEX
    Explanations

    punctuation marks and special characters

    New Auto-Interp
    Negative Logits
    。"
    -1.04
    ...'
    -1.02
    ..."
    -0.96
    '...
    -0.95
     ...'
    -0.91
    "¿
    -0.89
    ...."
    -0.88
    ...".
    -0.87
    ,'"
    -0.84
     ..."
    -0.84
    POSITIVE LOGITS
     “
    1.44
     ‘
    1.40
    1.33
    1.23
    =”
    1.18
    ’,
    1.17
    .’
    1.16
    ,’
    1.14
    ’.
    1.13
    =’
    1.13
    Act Density 0.218%

    No Known Activations