INDEX
Explanations
This neuron never activates (all token activations are zero), so it doesn’t detect any particular pattern.
New Auto-Interp
Negative Logits
otomy
-0.07
entend
-0.07
yet
-0.06
Trường
-0.06
终
-0.06
-0.06
uses
-0.06
074
-0.06
igor
-0.06
Twe
-0.06
POSITIVE LOGITS
pane
0.06
ayım
0.06
crisp
0.06
,',
0.06
deceased
0.06
'',
0.06
Renault
0.06
Erect
0.06
Nude
0.06
'_',
0.06
Activations Density 0.003%