INDEX
Explanations
The neuron is detecting tokens that form Google developer documentation URLs (e.g. “https://developers.google.com/...”).
New Auto-Interp
Negative Logits
ruits
-0.07
captures
-0.07
(World
-0.07
Accred
-0.07
antes
-0.06
/close
-0.06
.maven
-0.06
autos
-0.06
اجرا
-0.06
sprites
-0.06
POSITIVE LOGITS
Obama
0.06
Dense
0.06
골
0.06
↵
0.06
ان
0.06
cháy
0.06
weighted
0.06
้ม
0.06
aneous
0.06
UMENT
0.06
Activations Density 0.004%