INDEX
Explanations
questions and answers
This neuron is primarily detecting the model’s internal metadata markers (e.g. the special <|start_header_id|> token).
New Auto-Interp
Negative Logits
freeze
-0.07
RID
-0.07
حی
-0.06
Mission
-0.06
806
-0.06
заказ
-0.06
activ
-0.06
operands
-0.06
宣
-0.06
rollout
-0.06
POSITIVE LOGITS
.TryGetValue
0.07
ück
0.07
OMPI
0.07
PLY
0.07
proxy
0.07
Miller
0.06
uuml
0.06
spiked
0.06
ordinate
0.06
илися
0.06
Activations Density 0.005%