INDEX

Explanations

The neuron activates on first‐person self-references, i.e. tokens like “I,” “my,” and other personal commentary markers.

"I" followed by personal action or feeling

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 slutet

-1.11

at

-1.05

DAD

-1.00

过来了

-0.98

 augusti

-0.96

ijk

-0.95

iseks

-0.94

扱

-0.94

 terbesar

-0.94

 unique

-0.93

POSITIVE LOGITS

respective

1.22

 astuces

1.17

Watching

1.16

事が

1.15

 stratég

1.13

 personally

1.09

 Watching

1.07

 neither

1.07

 myself

1.05

𓆏

1.03

Activations Density 0.052%