INDEX

Explanations

A followed by specific noun

np_acts-logits-general · gemini-2.5-flash-lite

The neuron activates on isolated uppercase “A” tokens used as answer labels or section headings (e.g. the “A” marking an answer or answer‐score annotation).

oai_token-act-pair · o4-mini Triggered by @jyhe0408

uppercase “A” at the start of a word or as a standalone token, often at the beginning of a line or sentence.

oai_token-act-pair · gpt-5 Triggered by @jyhe0408

the single capital letter "A" when it appears as a standalone token or label in technical/programming contexts.

oai_token-act-pair · claude-4-5-sonnet Triggered by @jyhe0408

New Auto-Interp

Configuration

google/gemma-scope-27b-pt-res/layer_10/width_131k

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 diarios

-2.03

刳

-2.03

și

-2.02

黝

-1.95

 และ

-1.85

 produz

-1.80

şi

-1.79

だけです

-1.77

anc

-1.68

いえば

-1.66

POSITIVE LOGITS

 have

1.91

也不會

1.83

穩

1.78



1.73

並不是

1.73

1.72

 There

1.71

 Saltar

1.67

 from

1.66

渽

1.66

Activations Density 0.005%

A followed by specific noun

The neuron activates on isolated uppercase “A” tokens used as answer labels or section headings (e.g. the “A” marking an answer or answer‐score annotation).

uppercase “A” at the start of a word or as a standalone token, often at the beginning of a line or sentence.

the single capital letter "A" when it appears as a standalone token or label in technical/programming contexts.

No Comments

No Known Activations

A followed by specific noun

The neuron activates on isolated uppercase “A” tokens used as answer labels or section headings (e.g. the “A” marking an answer or answer‐score annotation).

uppercase “A” at the start of a word or as a standalone token, often at the beginning of a line or sentence.

the single capital letter "A" when it appears as a standalone token or label in technical/programming contexts.

No Comments

No Known Activations