INDEX
    Explanations

    brackets and nested structures in mathematical expressions

    New Auto-Interp
    Negative Logits
     itſelf
    -1.17
     pleaſure
    -1.13
     myſelf
    -1.11
    ſelves
    -1.05
     Jefus
    -1.05
     preſent
    -1.05
     Reſ
    -1.05
     raiſ
    -1.05
     Majefty
    -1.05
     ſtate
    -1.03
    POSITIVE LOGITS
    {
    1.08
     ‘
    0.83
     “
    0.82
    0.72
    (
    0.72
    /
    0.62
    __["
    0.61
    []{
    0.61
    [toxicity=0]
    0.60
    0.60
    Act Density 0.121%

    No Known Activations