INDEX
Explanations
occurrences of the word "the."
the + specific nouns
New Auto-Interp
Negative Logits
SharedDtor
-0.86
parsedMessage
-0.85
featureID
-0.84
OGND
-0.83
fromnode
-0.82
Personendaten
-0.77
⟬
-0.73
<unused14>
-0.73
<unused41>
-0.72
<unused79>
-0.72
POSITIVE LOGITS
the
1.16
The
1.13
THE
1.09
The
1.01
THE
0.92
the
0.89
their
0.63
ethe
0.59
sthe
0.57
OfThe
0.57
Activations Density 0.006%