INDEX
Explanations
academic citations
The neuron is primarily activating on numeric tokens (e.g. volume, page, year, and other multi‐digit numbers).
New Auto-Interp
Negative Logits
CA
-0.09
Cox
-0.08
c
-0.08
Oak
-0.08
cro
-0.08
aco
-0.08
ac
-0.08
Arc
-0.08
noc
-0.07
cale
-0.07
POSITIVE LOGITS
7
0.13
seven
0.09
107
0.09
197
0.09
97
0.09
7
0.09
307
0.08
७
0.08
27
0.08
七
0.08
Activations Density 0.237%