INDEX
    Explanations

    references to module definitions and specific data handling structures in code

    New Auto-Interp
    Negative Logits
     <
    -0.51
    1
    -0.50
    2
    -0.47
    -0.45
    -0.45
    3
    -0.44
    ><
    -0.40
    >
    -0.40
    0
    -0.40
     }^{
    -0.39
    POSITIVE LOGITS
     beſte
    0.91
     Geſch
    0.90
    <unused8>
    0.88
    [@BOS@]
    0.88
    <unused52>
    0.88
    <unused42>
    0.87
    <unused43>
    0.87
    <unused41>
    0.87
    <unused16>
    0.87
    <unused23>
    0.87
    Act Density 0.581%

    No Known Activations