INDEX
Explanations
This neuron fires on the special “control” tokens used to mark the start/end of text and speaker‐header boundaries (e.g. <|begin_of_text|>, <|start_header_id|>, <|end_header_id|>, user/assistant tags).
New Auto-Interp
Negative Logits
ớ
-0.07
body
-0.06
сон
-0.06
جوی
-0.06
教
-0.06
Manchester
-0.06
CM
-0.06
LW
-0.06
Swe
-0.06
Immediate
-0.06
POSITIVE LOGITS
desperate
0.07
USART
0.07
.onViewCreated
0.07
beauty
0.07
Dön
0.06
(columns
0.06
.constants
0.06
inker
0.06
Mutable
0.06
χρι
0.06
Activations Density 0.003%