INDEX
Explanations
This neuron flags directive, second-person instructional language—specifically “you”-addressed commands and modal verbs (e.g. “will,” “would,” “tell you,” “simulate”) used to instruct the assistant.
New Auto-Interp
Negative Logits
Isl
-0.07
focused
-0.06
gravitational
-0.06
comb
-0.06
actual
-0.06
نحو
-0.06
lında
-0.06
description
-0.06
substantially
-0.06
درس
-0.06
POSITIVE LOGITS
\E
0.06
.setVisibility
0.06
ENERGY
0.06
coinc
0.06
�
0.06
("[0.06
HUGE
0.06
oily
0.06
サイ
0.06
decid
0.06
Activations Density 0.006%