INDEX
Explanations
The neuron fires on ordinary English words appearing in code comments or explanatory prose rather than on actual code syntax.
New Auto-Interp
Negative Logits
así
-0.07
daha
-0.07
Luật
-0.07
argc
-0.07
ẩn
-0.07
usher
-0.06
freaking
-0.06
undy
-0.06
3
-0.06
拟
-0.06
POSITIVE LOGITS
/exp
0.07
الد
0.07
0.06
_U
0.06
-fix
0.06
ニメ
0.06
BO
0.06
(In
0.06
pcb
0.06
शहर
0.06
Activations Density 0.031%