INDEX
Explanations
The neuron selectively activates on common English function words—short articles, prepositions, and conjunctions such as “in,” “the,” and “and.”
New Auto-Interp
Negative Logits
T
-0.07
.structure
-0.07
one
-0.07
nen
-0.06
Mill
-0.06
EACH
-0.06
cock
-0.06
那
-0.06
999
-0.06
tae
-0.06
POSITIVE LOGITS
.
0.08
).
0.07
paredStatement
0.07
("");↵↵0.06
">↵
0.06
].
0.06
"//
0.06
")).
0.06
URIComponent
0.06
','"+
0.06
Activations Density 0.015%