INDEX
    Explanations

    instances of the word "either."

    New Auto-Interp
    Negative Logits
    <unused74>
    -1.09
    <unused43>
    -1.09
    <unused41>
    -1.09
    <unused42>
    -1.09
    <unused8>
    -1.09
    <pad>
    -1.09
    <unused14>
    -1.09
    <unused23>
    -1.09
    <unused79>
    -1.09
    [@BOS@]
    -1.08
    POSITIVE LOGITS
     either
    0.43
    ry
    0.41
     matching
    0.36
     is
    0.35
    re
    0.33
    either
    0.33
    new
    0.32
    0.32
    3
    0.32
     Either
    0.31
    Act Density 0.303%

    No Known Activations