INDEX
Explanations
The neuron fires on structural/control tokens (the metadata and boundary markers surrounding the user/system/instruction blocks).
New Auto-Interp
Negative Logits
också
-0.07
erosis
-0.06
_many
-0.06
weary
-0.06
ours
-0.06
Put
-0.06
/bus
-0.06
/fonts
-0.06
PROF
-0.06
.students
-0.06
POSITIVE LOGITS
lock
0.08
seb
0.08
DNS
0.07
DNS
0.07
jam
0.07
DeepCopy
0.07
는지
0.07
_GUID
0.06
extensions
0.06
ns
0.06
Activations Density 0.068%