INDEX
Explanations
quantifiers
The neuron selectively activates on numeric tokens, especially decimal numbers and digit‐containing measurements.
New Auto-Interp
Negative Logits
ictionary
-0.07
Orr
-0.07
making
-0.07
Lopez
-0.06
quiz
-0.06
Order
-0.06
Kong
-0.06
widening
-0.06
Hernandez
-0.06
--------↵↵
-0.06
POSITIVE LOGITS
aison
0.08
statt
0.07
스가
0.06
далі
0.06
Ģ
0.06
"()
0.06
Tôi
0.06
zwar
0.06
مستقیم
0.06
tends
0.06
Activations Density 0.070%