INDEX
    Explanations

    occurrences of the word "the."

    Following the word "the"

    the definite article before a noun

    New Auto-Interp
    Negative Logits
    <unused74>
    -0.98
    <pad>
    -0.98
    <unused8>
    -0.98
    <unused14>
    -0.98
    <unused52>
    -0.98
    <unused80>
    -0.98
    <unused42>
    -0.98
    <unused41>
    -0.98
    <unused16>
    -0.98
    <unused23>
    -0.98
    POSITIVE LOGITS
     I
    0.45
     the
    0.43
    The
    0.37
    ,
    0.34
     The
    0.34
     we
    0.34
    .
    0.33
     In
    0.33
     these
    0.33
    main
    0.33
    Act Density 0.864%

    No Known Activations