INDEX
Explanations
occurrences of the word "the"
the + noun phrase
New Auto-Interp
Negative Logits
[@BOS@]
-0.99
<unused43>
-0.99
<unused41>
-0.99
<unused74>
-0.98
<unused14>
-0.98
<unused8>
-0.98
<unused17>
-0.98
<unused23>
-0.98
<unused16>
-0.98
<unused3>
-0.98
POSITIVE LOGITS
we
0.42
I
0.38
he
0.36
-
0.32
she
0.28
it
0.28
there
0.27
no
0.27
g
0.26
p
0.26
Activations Density 0.135%