INDEX
Explanations
The neuron activates on the assistant’s expressions of willingness or offers to help (e.g. “help you,” “happy to help,” “be happy to assist”).
New Auto-Interp
Negative Logits
_Length
-0.07
кра
-0.07
Shirley
-0.07
.cards
-0.07
ció
-0.07
/ca
-0.06
-stars
-0.06
.gf
-0.06
_income
-0.06
intestinal
-0.06
POSITIVE LOGITS
hasher
0.07
JButton
0.06
ECT
0.06
Anyone
0.06
alloween
0.06
Mud
0.06
іть
0.06
hearts
0.06
Halloween
0.06
础
0.06
Activations Density 0.025%