INDEX
Explanations
This neuron activates on numeric tokens—integers, decimals, fractions, and signed numbers.
New Auto-Interp
Negative Logits
Garrett
-0.07
_PARAM
-0.06
Kı
-0.06
_CAT
-0.06
".↵↵
-0.06
prefer
-0.06
-family
-0.06
password
-0.06
Incredible
-0.06
食
-0.06
POSITIVE LOGITS
eliminated
0.06
patible
0.06
-article
0.06
переда
0.06
Beta
0.06
تس
0.06
внес
0.06
yle
0.06
.norm
0.06
-interface
0.06
Activations Density 0.005%