INDEX
    Explanations

    mathematical or technical terminology and symbols used in formal contexts

    New Auto-Interp
    Negative Logits
    <unused23>
    -1.54
    <unused41>
    -1.54
    <unused16>
    -1.53
    <unused8>
    -1.53
    <unused42>
    -1.53
    <unused79>
    -1.53
    <unused74>
    -1.53
    <unused51>
    -1.53
    <unused43>
    -1.53
    <pad>
    -1.52
    POSITIVE LOGITS
    .
    0.76
    ↵↵
    0.65
    ,
    0.64
    0.60
     (
    0.53
    1
    0.52
    -
    0.49
    ;
    0.48
    "
    0.48
     -
    0.48
    Act Density 0.217%

    No Known Activations