INDEX
Explanations
frequent conjunctions and pronouns
articles and prepositions followed by nouns
New Auto-Interp
Negative Logits
ſſung
-1.25
<unused8>
-1.24
<unused16>
-1.24
<unused28>
-1.24
<unused41>
-1.24
[@BOS@]
-1.24
<unused14>
-1.24
<unused23>
-1.24
<unused52>
-1.24
<unused17>
-1.24
POSITIVE LOGITS
0.41
1
0.40
the
0.39
e
0.36
2
0.36
O
0.34
,
0.34
The
0.32
and
0.32
"
0.31
Activations Density 0.032%