INDEX
    Explanations

    repeated phrases, particularly articles and demonstratives, indicating focus on specific references in the text

    Followed by adjectives

    determiners followed by nouns

    New Auto-Interp
    Negative Logits
     the
    -0.45
    -0.44
     parcialmente
    -0.35
     Widerspruch
    -0.35
     a
    -0.34
     Landwirtschaft
    -0.33
    The
    -0.33
    begin
    -0.32
     with
    -0.32
     Stufe
    -0.32
    POSITIVE LOGITS
    <unused43>
    1.09
    <unused41>
    1.09
    <unused16>
    1.09
    <unused28>
    1.09
    <unused3>
    1.09
    [@BOS@]
    1.09
    <unused8>
    1.09
    <unused14>
    1.09
    <unused51>
    1.09
    <unused52>
    1.09
    Act Density 1.352%

    No Known Activations