INDEX
Explanations
punctuation
This neuron activates on tokens associated with dialogue speaker attributions (e.g. character names or their labels introducing a line of speech).
New Auto-Interp
Negative Logits
Directory
-0.08
不了
-0.07
processors
-0.06
cite
-0.06
Har
-0.06
�
-0.06
.,
-0.06
уска
-0.06
semp
-0.06
cribed
-0.06
POSITIVE LOGITS
acey
0.06
/Dk
0.06
((__
0.06
"',↵
0.06
]},↵
0.06
visions
0.06
ožná
0.06
]++;↵
0.06
])+
0.06
}`);↵
0.06
Activations Density 0.094%