INDEX
Explanations
Article previews
The neuron activates almost exclusively on structural/control tokens (e.g. end-of-text or header markers), i.e. it is detecting metadata/chat formatting tokens rather than natural-language content.
New Auto-Interp
Negative Logits
928
-0.06
enc
-0.06
Abs
-0.06
fb
-0.06
FormData
-0.06
SUS
-0.06
caption
-0.06
PUR
-0.06
_Reference
-0.06
tube
-0.06
POSITIVE LOGITS
_HOUR
0.07
هفته
0.07
رسانه
0.07
적
0.07
_escape
0.07
mantle
0.06
hong
0.06
.BorderSize
0.06
影
0.06
camel
0.06
Activations Density 0.008%