INDEX
    Explanations

    punctuation and specific phrases indicating actions or interactions

    New Auto-Interp
    Negative Logits
     useParams
    -0.32
    </em>
    -0.31
    ↵↵
    -0.29
     einzu
    -0.29
     lleg
    -0.29
     in
    -0.28
    -0.28
    Zus
    -0.27
     ek
    -0.27
     "
    -0.27
    POSITIVE LOGITS
    ſſung
    0.84
    ſelf
    0.82
    <unused16>
    0.81
    [@BOS@]
    0.81
    <unused8>
    0.81
    <unused43>
    0.81
    <unused80>
    0.81
    <unused41>
    0.81
    <unused3>
    0.81
    <unused23>
    0.81
    Act Density 0.389%

    No Known Activations