INDEX
Explanations
Language or not speaking English
The neuron activates on phrases where the assistant acknowledges or switches into a particular human language (e.g. “em português,” “en español,” “auf Deutsch,” “en français”).
New Auto-Interp
Negative Logits
/md
-0.07
-0.06
salad
-0.06
_REFRESH
-0.06
组
-0.06
Mars
-0.06
fled
-0.06
ектив
-0.06
bindActionCreators
-0.06
üzerinde
-0.06
POSITIVE LOGITS
xm
0.07
ourke
0.06
_Em
0.06
áv
0.06
reasons
0.06
[↵↵
0.06
llum
0.06
riteln
0.06
hya
0.06
imiter
0.06
Activations Density 0.037%