INDEX
Explanations
This neuron detects tokens with unusually low model confidence (i.e. rare or surprising tokens).
New Auto-Interp
Negative Logits
Cache
-0.07
411
-0.06
�어
-0.06
Rewrite
-0.06
elastic
-0.06
Stefan
-0.06
�
-0.06
conditionally
-0.06
.WriteString
-0.06
BigNumber
-0.06
POSITIVE LOGITS
ustanov
0.07
kriz
0.06
iks
0.06
しかし
0.06
`[
0.06
:";↵
0.06
uco
0.06
مثلا
0.06
%;↵
0.06
[${0.06
Activations Density 0.052%