INDEX
Explanations
occurrences of the word "the."
Following the word "the"
the definite article before a noun
New Auto-Interp
Negative Logits
<unused74>
-0.98
<pad>
-0.98
<unused8>
-0.98
<unused14>
-0.98
<unused52>
-0.98
<unused80>
-0.98
<unused42>
-0.98
<unused41>
-0.98
<unused16>
-0.98
<unused23>
-0.98
POSITIVE LOGITS
I
0.45
the
0.43
The
0.37
,
0.34
The
0.34
we
0.34
.
0.33
In
0.33
these
0.33
main
0.33
Activations Density 0.864%