INDEX
Explanations
forms of "to be"
This neuron fires on the assistant’s standard self-introduction/help offers—i.e. phrases like “I’m here to help/assist you.”
New Auto-Interp
Negative Logits
_reward
-0.07
Independent
-0.06
brittle
-0.06
혀
-0.06
Chic
-0.06
Pas
-0.06
isinin
-0.06
Tes
-0.06
Spin
-0.06
gm
-0.06
POSITIVE LOGITS
exacerb
0.07
atheist
0.06
kB
0.06
escalated
0.06
Ž
0.06
-ready
0.06
gehen
0.06
буд
0.06
lda
0.06
odable
0.06
Activations Density 0.013%