INDEX
    Explanations

    references to specific events or organized activities

    New Auto-Interp
    Negative Logits
     hit
    -0.28
     P
    -0.28
     ex
    -0.28
     sha
    -0.26
    Surprisingly
    -0.25
    -0.25
     staggering
    -0.25
     Surprisingly
    -0.25
     So
    -0.24
     ME
    -0.24
    POSITIVE LOGITS
    <unused8>
    0.92
    [@BOS@]
    0.92
    <unused14>
    0.92
    rungsseite
    0.92
    <unused41>
    0.91
    <unused28>
    0.91
    <unused43>
    0.91
    <unused51>
    0.91
    <pad>
    0.91
    <unused3>
    0.91
    Act Density 0.017%

    No Known Activations