INDEX
Explanations
The neuron steadily increases its activation the further into a generated or quoted text it moves, effectively detecting “later” or “deep” positions in the token sequence.
New Auto-Interp
Negative Logits
psychologists
-0.07
syn
-0.07
transformer
-0.06
anti
-0.06
영국
-0.06
yon
-0.06
actal
-0.06
Streams
-0.06
65
-0.06
823
-0.06
POSITIVE LOGITS
maxx
0.08
../
0.07
.ms
0.06
ине
0.06
Redistributions
0.06
register
0.06
итай
0.06
following
0.06
contrib
0.06
.hide
0.06
Activations Density 0.067%