INDEX
Explanations
text related to instructions, setup, guidance, or descriptions of AI behaviors.
The neuron selectively activates on the pronoun “it” (in both lowercase and uppercase forms).
New Auto-Interp
Negative Logits
veloper
-0.08
Friends
-0.06
doctors
-0.06
budd
-0.06
leaders
-0.06
й
-0.06
comed
-0.06
bullying
-0.06
volunteering
-0.06
orrent
-0.06
POSITIVE LOGITS
batchSize
0.08
searching
0.07
матері
0.06
柴
0.06
{}]0.06
toward
0.06
vertex
0.06
雪
0.06
(.)
0.06
없이
0.06
Activations Density 0.005%