INDEX
Explanations
This neuron never actually activates on any token—i.e. it’s effectively “dead” and doesn’t detect any specific pattern.
New Auto-Interp
Negative Logits
OTH
-0.07
—"
-0.07
cannot
-0.07
、「
-0.07
(); ↵ ↵ ↵
-0.06
mentally
-0.06
Born
-0.06
comment
-0.06
ồng
-0.06
volunteered
-0.06
POSITIVE LOGITS
유
0.07
Tage
0.07
lavender
0.06
'./../
0.06
ir
0.06
vår
0.06
дерева
0.06
olicited
0.06
republik
0.06
Backdrop
0.06
Activations Density 0.005%