INDEX
Explanations
Conversational responses
The neuron fires on the assistant’s polite greeting and offer-to-help phrases (e.g. “Hello! I’d be happy to help you with…”).
New Auto-Interp
Negative Logits
времени
-0.07
ает
-0.07
.eng
-0.06
belief
-0.06
.Tree
-0.06
沈
-0.06
obe
-0.06
_ATOMIC
-0.06
mach
-0.06
интер
-0.06
POSITIVE LOGITS
oen
0.08
cursed
0.07
heel
0.07
cite
0.07
elow
0.06
jon
0.06
terrible
0.06
=['
0.06
[data
0.06
sustaining
0.06
Activations Density 0.064%