INDEX
Explanations
Wikipedia articles
The neuron is effectively dead—it never activates on any token and thus isn’t detecting any feature.
New Auto-Interp
Negative Logits
introduce
-0.08
transcripts
-0.07
ráno
-0.07
SPL
-0.07
International
-0.06
International
-0.06
svým
-0.06
international
-0.06
metav
-0.06
/Create
-0.06
POSITIVE LOGITS
sqlCommand
0.07
ONLY
0.07
rol
0.07
↵↵
0.07
ãeste
0.07
annon
0.06
833
0.06
_brand
0.06
↵↵
0.06
애
0.06
Activations Density 0.004%