INDEX
Explanations
This neuron fires on words and short phrases used to introduce advice or recommendations (e.g. “start,” “try,” “make,” “use,” “consider”).
New Auto-Interp
Negative Logits
essential
-0.06
punishments
-0.06
/misc
-0.06
CHANNEL
-0.06
açıklam
-0.06
jspb
-0.06
btn
-0.06
чт
-0.06
inverse
-0.06
.Hidden
-0.06
POSITIVE LOGITS
�
0.07
급
0.06
έναν
0.06
na
0.06
вест
0.06
leží
0.06
Chavez
0.06
ㅇ
0.06
šek
0.06
λμ
0.06
Activations Density 0.057%