INDEX
    Explanations

    punctuation marks, specifically periods and exclamation points

    New Auto-Interp
    Negative Logits
    â̦↵↵
    -0.19
    -0.18
    ...↵↵
    -0.18
    -0.18
    --↵↵
    -0.17
    -0.17
    ”.
    -0.17
    ’n
    -0.17
    —↵↵
    -0.17
    -↵↵
    -0.16
    POSITIVE LOGITS
    .↵
    0.27
    ]↵
    0.22
    )↵
    0.22
    ).↵
    0.22
    ा.↵
    0.21
    â̬↵
    0.21
    ãĢĤ↵
    0.21
    }↵
    0.21
    ."↵
    0.20
    >.↵
    0.20
    Act Density 1.408%

    No Known Activations