INDEX
Explanations
This neuron steadily increases its activation the further a token is from the start of the text, effectively acting as a positional counter that detects how deep you are into the document.
New Auto-Interp
Negative Logits
_job
-0.07
benchmark
-0.07
median
-0.07
-log
-0.07
Philippines
-0.06
Policy
-0.06
lav
-0.06
zw
-0.06
Policy
-0.06
leton
-0.06
POSITIVE LOGITS
іс
0.07
педагог
0.07
sexuales
0.07
']}}</
0.06
(shader
0.06
�
0.06
ها
0.06
.dst
0.06
()}</
0.06
RGBO
0.06
Activations Density 0.046%