INDEX
Explanations
This neuron is effectively dead—it never activates on any token.
New Auto-Interp
Negative Logits
}};↵
-0.08
].
-0.07
relig
-0.07
dossier
-0.07
edeyse
-0.07
!
-0.07
expression
-0.07
urlpatterns
-0.07
].
-0.06
bridge
-0.06
POSITIVE LOGITS
管
0.07
orex
0.06
отов
0.06
eder
0.06
pueden
0.06
tá
0.06
timeouts
0.06
Ré
0.06
(dr
0.06
grabbed
0.06
Activations Density 0.019%