INDEX
Explanations
The neuron fires on the assistant’s Chinese greeting phrases (e.g., “你好…”) at the start of its replies.
New Auto-Interp
Negative Logits
Cog
-0.07
prefix
-0.06
map
-0.06
┴
-0.06
an
-0.06
Recipe
-0.06
ANGE
-0.06
urlparse
-0.06
/ne
-0.06
OX
-0.06
POSITIVE LOGITS
peptides
0.07
бороть
0.07
produits
0.07
bryster
0.07
.SetFloat
0.07
.isChecked
0.07
خودش
0.07
حالت
0.07
<?↵
0.06
história
0.06
Activations Density 0.017%