INDEX
Explanations
12 without zeros filtered out, the main thing this neuron does is find numbers as potential references or annotations within the text
references to specific numerical values or identifiers, particularly focusing on the number 12
New Auto-Interp
Negative Logits
tremend
-0.83
behavi
-0.82
chwitz
-0.80
merce
-0.80
ileaks
-0.78
ifully
-0.77
icist
-0.77
iciary
-0.77
ierrez
-0.76
Ô
-0.76
POSITIVE LOGITS
91
1.01
34
1.01
650
0.98
92
0.98
87
0.97
82
0.94
71
0.93
76
0.93
02
0.93
94
0.93
Activations Density 0.037%