INDEX
Explanations
The neuron consistently activates on numeric literal tokens in code.
New Auto-Interp
Negative Logits
alysis
-0.07
>("-0.06
jednání
-0.06
girlfriend
-0.06
Disease
-0.06
Between
-0.06
brief
-0.06
τους
-0.06
rend
-0.06
wait
-0.06
POSITIVE LOGITS
uest
0.07
ledge
0.07
ApiController
0.07
asher
0.06
(@
0.06
مه
0.06
तम
0.06
식
0.06
<=$
0.06
认识
0.06
Activations Density 0.017%