INDEX
    Explanations

    portions of mathematical or technical formatting

    New Auto-Interp
    Negative Logits
     ſind
    -0.87
     ſei
    -0.80
    iſen
    -0.75
    ſchaft
    -0.74
    ConstraintMaker
    -0.74
    ftagPool
    -0.73
    хьтан
    -0.73
    征詢我
    -0.73
     Elden
    -0.72
     للمعارف
    -0.72
    POSITIVE LOGITS
    .
    0.56
    ↵↵
    0.55
    <eos>
    0.45
    </tr>
    0.43
    ].
    0.43
    3
    0.42
    .]
    0.41
    0.41
    .}
    0.40
    </table>
    0.39
    Act Density 0.019%

    No Known Activations