INDEX
Explanations
The neuron activates on numeric tokens (years, dates, percentages, and other numbers).
New Auto-Interp
Negative Logits
AV
-0.07
19
-0.07
15
-0.07
xDC
-0.07
23
-0.07
98
-0.07
Aid
-0.07
at
-0.07
At
-0.06
extent
-0.06
POSITIVE LOGITS
the
0.13
on
0.10
-the
0.10
The
0.10
THE
0.09
its
0.09
The
0.09
my
0.09
the
0.08
his
0.08
Activations Density 0.113%