INDEX
Explanations
lists of categories
The neuron activates on tokens appearing in Wikipedia “Category:” lines.
New Auto-Interp
Negative Logits
grams
-0.07
hei
-0.07
ама
-0.07
.environment
-0.07
/tag
-0.07
official
-0.06
unst
-0.06
[at
-0.06
docs
-0.06
Embed
-0.06
POSITIVE LOGITS
最后
0.07
_DEVICES
0.07
\Field
0.07
대한
0.06
\L
0.06
ัฒนา
0.06
podnikatel
0.06
Opened
0.06
náro
0.06
subway
0.06
Activations Density 0.055%