INDEX
    Explanations

    expressions related to restriction and loss of freedom

    New Auto-Interp
    Negative Logits
     Sachsen
    -0.38
    -0.34
     te
    -0.33
    ome
    -0.33
     Jacobs
    -0.33
    1
    -0.32
    -
    -0.32
     stated
    -0.32
     declared
    -0.32
    /
    -0.32
    POSITIVE LOGITS
     zwiſchen
    0.76
    <unused41>
    0.75
    <unused42>
    0.75
    <unused43>
    0.75
    <unused74>
    0.75
    <unused8>
    0.75
    <pad>
    0.75
    <unused17>
    0.74
    <unused16>
    0.74
    [@BOS@]
    0.74
    Act Density 0.053%

    No Known Activations