INDEX
    Explanations

    punctuation

    New Auto-Interp
    Negative Logits
    ’.”
    -1.04
    ’”
    -1.02
    .’”
    -1.01
    ’).
    -0.99
    .”)
    -0.98
    ,’”
    -0.98
    ?”.
    -0.98
    ),”
    -0.96
    ?“
    -0.95
    —“
    -0.95
    POSITIVE LOGITS
    0.63
    <eos>
    0.61
    ↵↵
    0.55
    HomeAsUpEnabled
    0.47
    0.43
    ...
    0.38
    ↵↵↵
    0.38
    \\
    0.38
    :
    0.36
    ↵↵↵↵
    0.36
    Act Density 0.045%

    No Known Activations