INDEX
Explanations
This neuron detects boundary markers—words signaling “end” or “after” (e.g. the token “end” in English and “后面” in Chinese).
New Auto-Interp
Negative Logits
出品者
-0.07
олот
-0.07
excludes
-0.07
оны
-0.06
Gear
-0.06
Walking
-0.06
미국
-0.06
forg
-0.06
changed
-0.06
itches
-0.06
POSITIVE LOGITS
_tasks
0.07
CENT
0.06
_SECRET
0.06
:last
0.06
brace
0.06
Mazda
0.06
.inf
0.06
verbosity
0.06
\uC
0.06
contend
0.06
Activations Density 0.240%