INDEX
    Explanations

    table formatting characters

    New Auto-Interp
    Negative Logits
    ())));
    0.87
    ());
    0.83
    ()));
    0.82
    …).
    0.81
    ())).
    0.81
    ()):
    0.80
    !!!");
    0.80
     ""),
    0.79
    **
    0.77
    ):=\
    0.77
    POSITIVE LOGITS
     |
    4.11
    |
    3.11
     $|
    2.52
     |\
    2.44
    }|
    2.23
    |$
    2.22
     |.
    2.20
     |,
    2.19
     |-
    2.16
     \|
    2.15
    Act Density 0.551%

    No Known Activations