INDEX
    Explanations

    punctuation and finality

    New Auto-Interp
    Negative Logits
     (
    0.96
    。(
    0.91
    0.82
    0.81
    .
    0.80
    .(
    0.76
    。《
    0.75
     (_
    0.73
    0.69
     (\
    0.68
    POSITIVE LOGITS
    !!)
    0.93
    !)
    0.79
    !)
    0.76
     !)
    0.75
    ...)
    0.75
    !)
    0.74
    …)
    0.72
    *)
    0.70
    +)
    0.69
    ...),
    0.68
    Act Density 0.915%

    No Known Activations