INDEX
Explanations
overview
The neuron specifically fires on section-heading tokens—most notably the word “Overview.”
New Auto-Interp
Negative Logits
Investment
-0.07
_into
-0.07
DETAIL
-0.06
acho
-0.06
Alex
-0.06
Daw
-0.06
random
-0.06
AINS
-0.06
013
-0.06
نو
-0.06
POSITIVE LOGITS
елеф
0.06
칙
0.06
Guide
0.06
pale
0.06
roids
0.06
Specialist
0.06
_LEFT
0.06
ark
0.06
attachments
0.05
lombok
0.05
Activations Density 0.007%