INDEX
Explanations
This neuron does not activate on any tokens—it remains effectively silent and does not detect any patterns.
New Auto-Interp
Negative Logits
poles
-0.07
capitalize
-0.07
fare
-0.06
цвет
-0.06
dirt
-0.06
Baseline
-0.06
Parallel
-0.06
dead
-0.06
ides
-0.06
enchmark
-0.06
POSITIVE LOGITS
(jq
0.07
_TEMPLATE
0.06
,eg
0.06
hg
0.06
oğ
0.06
“그
0.06
raig
0.06
ع
0.06
authentication
0.06
nossa
0.06
Activations Density 0.001%