INDEX
    Explanations

    sequences of underscores or repeated special characters

    New Auto-Interp
    Negative Logits
    <tr>
    -0.86
    .
    -0.83
     it
    -0.66
    </tr>
    -0.65
     a
    -0.65
    [toxicity=0]
    -0.64
    )
    -0.64
     the
    -0.63
     The
    -0.63
     I
    -0.62
    POSITIVE LOGITS
    +#+
    1.44
     Efq
    1.24
     Reſ
    1.23
     Shakspeare
    1.22
     Jefus
    1.21
     Majefty
    1.17
     Monfieur
    1.15
     Chriftian
    1.14
    Datuak
    1.14
     itſelf
    1.14
    Act Density 0.908%

    No Known Activations