INDEX
Explanations
Category
The neuron specializes in detecting “Category:” labels that mark the metadata/category section of a Wiki-style article.
New Auto-Interp
Negative Logits
elim
-0.08
Christ
-0.08
Version
-0.07
-0.07
Time
-0.07
Bur
-0.06
Bur
-0.06
CAST
-0.06
/$',
-0.06
člově
-0.06
POSITIVE LOGITS
.AutoScale
0.07
uky
0.06
tricks
0.06
.seconds
0.06
ounty
0.06
.react
0.06
слу
0.06
.my
0.06
田
0.06
ुछ
0.06
Activations Density 0.005%