INDEX
Explanations
code snippets
The neuron is detecting Python code tokens (i.e. the parts inside code blocks) rather than narrative text.
New Auto-Interp
Negative Logits
ovie
-0.07
EZ
-0.07
evaluations
-0.06
.movie
-0.06
confidential
-0.06
ords
-0.06
-X
-0.06
なく
-0.06
InternalEnumerator
-0.06
_videos
-0.06
POSITIVE LOGITS
Seeing
0.07
ılığ
0.07
�
0.07
.You
0.07
.There
0.06
propagate
0.06
�
0.06
calming
0.06
zun
0.06
withstanding
0.06
Activations Density 0.041%