INDEX
    Explanations

    periods indicating sentence boundaries

    end of sentence connectors

    New Auto-Interp
    Negative Logits
    IntoConstraints
    -0.97
    enderror
    -0.92
    𑄮
    -0.91
     Dieſe
    -0.90
    <unused79>
    -0.90
    [@BOS@]
    -0.90
    <pad>
    -0.90
    <unused16>
    -0.90
    <unused8>
    -0.90
    <unused6>
    -0.90
    POSITIVE LOGITS
    ,
    0.29
    /
    0.27
     only
    0.27
     not
    0.26
     i
    0.26
    -
    0.25
     in
    0.24
     significantly
    0.24
    0.23
     (
    0.23
    Act Density 0.008%

    No Known Activations