INDEX
Explanations
This neuron remains inactive on every token in these examples—it doesn’t detect any particular pattern.
New Auto-Interp
Negative Logits
_avg
-0.08
Snow
-0.06
802
-0.06
Election
-0.06
-Le
-0.06
unreliable
-0.06
]=="
-0.06
Berlin
-0.06
临
-0.06
-fold
-0.06
POSITIVE LOGITS
numerator
0.08
prit
0.07
rhs
0.07
rhs
0.07
RHS
0.07
lhs
0.06
thoughtful
0.06
SqlDbType
0.06
daemon
0.06
writeln
0.06
Activations Density 0.005%