INDEX
    Explanations

    conditional statements and requirements

    New Auto-Interp
    Negative Logits
    <unused8>
    -0.79
    [@BOS@]
    -0.79
    <unused41>
    -0.79
    <unused52>
    -0.79
    ſelben
    -0.79
    <unused23>
    -0.79
    <unused43>
    -0.79
    <unused17>
    -0.79
    <pad>
    -0.79
    <unused14>
    -0.79
    POSITIVE LOGITS
     gjø
    0.35
    needs
    0.30
     αρ
    0.28
     needs
    0.28
     served
    0.26
     nh
    0.26
     worden
    0.26
     belongs
    0.26
     should
    0.25
     NEEDS
    0.25
    Act Density 0.167%

    No Known Activations