INDEX
Explanations
The neuron activates on numeric tokens representing precise decimal values (i.e., floating-point numbers).
New Auto-Interp
Negative Logits
��
-0.06
脂
-0.06
、↵
-0.06
因
-0.06
foobar
-0.06
Ted
-0.05
-copy
-0.05
身
-0.05
(ic
-0.05
sobě
-0.05
POSITIVE LOGITS
Contract
0.07
occupations
0.07
adventurer
0.07
)_
0.07
opis
0.06
nouveaux
0.06
Category
0.06
했다
0.06
ンディ
0.06
minimal
0.06
Activations Density 0.004%