INDEX
Explanations
the word "you" in various forms and contexts
New Auto-Interp
Negative Logits
raiſ
-1.13
<unused43>
-1.09
<unused14>
-1.09
<unused41>
-1.09
<unused28>
-1.09
<unused23>
-1.09
<unused42>
-1.09
<unused74>
-1.09
[@BOS@]
-1.09
<unused8>
-1.09
POSITIVE LOGITS
You
1.13
you
1.11
YOU
1.07
YOU
0.97
You
0.93
they
0.82
we
0.76
you
0.72
it
0.69
him
0.68
Activations Density 0.324%