INDEX
Explanations
This neuron detects numeric tokens—numbers, years, scores, and other digit-containing tokens in the text.
New Auto-Interp
Negative Logits
formalism
0.34
hysteresis
0.34
deleterious
0.33
herence
0.32
passivation
0.32
paraphr
0.32
interfacial
0.31
astring
0.31
zelfde
0.31
multivalued
0.31
POSITIVE LOGITS
Kochi
0.30
約
0.29
April
0.29
DiCaprio
0.29
올해
0.29
September
0.28
레
0.28
yaklaşık
0.28
、
0.28
🥇
0.28
Activations Density 0.069%