INDEX
Explanations
The neuron primarily fires on the first‐person pronoun “I,” marking sentences where the speaker refers to themselves.
New Auto-Interp
Negative Logits
residency
-0.06
cấu
-0.06
달
-0.06
ermann
-0.06
<Contact
-0.06
Merchant
-0.06
フ
-0.06
略
-0.06
(history
-0.06
alpha
-0.06
POSITIVE LOGITS
\core
0.07
_ic
0.07
.Material
0.06
nda
0.06
zure
0.06
pe
0.06
OSH
0.06
rib
0.06
ime
0.06
_APP
0.06
Activations Density 0.134%