INDEX
Explanations
Categories/Metadata
The neuron strongly activates on tokens that appear as Wikipedia category labels (the “Category:” lines at the end of an article).
New Auto-Interp
Negative Logits
807
-0.06
пись
-0.06
von
-0.06
entropy
-0.06
сл
-0.06
423
-0.06
』(
-0.06
Angiosper
-0.06
سیاست
-0.06
Сем
-0.06
POSITIVE LOGITS
0.07
lal
0.07
anv
0.06
AND
0.06
eval
0.06
uteč
0.06
={}0.06
↵ ↵ ↵
0.06
.Program
0.06
ocide
0.06
Activations Density 0.024%