INDEX
    Explanations

    punctuation marks, primarily at the end of sentences or as part of dialogue

    New Auto-Interp
    Negative Logits
    -0.36
    ↵↵
    -0.22
    ’s
    -0.21
    :
    -0.21
    ’t
    -0.19
    &nbsp
    -0.19
    ↵	↵
    -0.18
    /or
    -0.18
    ’m
    -0.18
    ’re
    -0.17
    POSITIVE LOGITS
    ÂĿ
    0.32
    That
    0.18
     That
    0.17
    ¦
    0.17
     And
    0.17
    '↵
    0.17
    This
    0.16
    "↵
    0.16
     This
    0.16
    And
    0.16
    Act Density 0.123%

    No Known Activations