INDEX
Explanations
introduction
This neuron detects section heading labels, in particular the “Introduction” heading.
New Auto-Interp
Negative Logits
quisites
-0.08
hou
-0.07
tercih
-0.07
Resist
-0.06
故
-0.06
hogy
-0.06
snake
-0.06
IMPLIED
-0.06
fou
-0.06
tout
-0.06
POSITIVE LOGITS
const
0.07
.flush
0.07
(itr
0.06
(Route
0.06
_DIG
0.06
acyj
0.06
splendid
0.06
opa
0.06
(;
0.06
outweigh
0.06
Activations Density 0.001%