INDEX
Explanations
The neuron selectively activates on numeric tokens—especially decimals and section or case citation numbers.
New Auto-Interp
Negative Logits
IPT
-0.07
mannen
-0.07
392
-0.06
inction
-0.06
nilai
-0.06
Transformers
-0.06
ucz
-0.06
bing
-0.06
ceries
-0.06
enza
-0.06
POSITIVE LOGITS
лся
0.07
서울특별시
0.07
record
0.07
Christina
0.07
pests
0.06
<hr
0.06
alongside
0.06
Veteran
0.06
Họ
0.06
occasional
0.06
Activations Density 0.001%