INDEX
Explanations
disambiguation
The neuron fires on parenthesized Wikipedia‐style disambiguation or type labels (e.g. “(disambiguation)”, “(surname)”, “(given name)”).
New Auto-Interp
Negative Logits
南
-0.07
ноября
-0.07
862
-0.07
*
-0.06
Fant
-0.06
169
-0.06
Mich
-0.06
Stamina
-0.06
Ferr
-0.06
オ
-0.06
POSITIVE LOGITS
DOWN
0.06
bli
0.06
��드
0.06
원이
0.06
coordinate
0.06
_GT
0.06
-coordinate
0.06
lie
0.06
Fix
0.05
charm
0.05
Activations Density 0.001%