INDEX
Explanations
The neuron fires on second-person references—particularly “you” (and its forms like “your”) addressing the reader.
New Auto-Interp
Negative Logits
bil
-0.07
:F
-0.07
Fi
-0.07
el
-0.06
[F
-0.06
Fel
-0.06
Lim
-0.06
ölüm
-0.06
Lil
-0.06
(reordered
-0.06
POSITIVE LOGITS
you
0.24
You
0.21
you
0.19
You
0.18
YOU
0.17
.You
0.17
-you
0.14
your
0.14
"You
0.13
—you
0.13
Activations Density 0.746%