INDEX
    Explanations

    the word "you" in various forms and contexts

    New Auto-Interp
    Negative Logits
     raiſ
    -1.13
    <unused43>
    -1.09
    <unused14>
    -1.09
    <unused41>
    -1.09
    <unused28>
    -1.09
    <unused23>
    -1.09
    <unused42>
    -1.09
    <unused74>
    -1.09
    [@BOS@]
    -1.09
    <unused8>
    -1.09
    POSITIVE LOGITS
     You
    1.13
    you
    1.11
     YOU
    1.07
    YOU
    0.97
    You
    0.93
     they
    0.82
     we
    0.76
     you
    0.72
     it
    0.69
     him
    0.68
    Act Density 0.324%

    No Known Activations