INDEX

Explanations

refusing harmful or explicit content

np_acts-logits-general · gemini-2.5-flash-lite

The neuron is highly active on punctuation (e.g. semicolons, parentheses) and on discourse‐marker words (e.g. “but,” “however,” “etc.”), i.e. tokens that structure and link clauses.

oai_token-act-pair · o4-mini Triggered by @jyhe0408

periods at the end of sentences.

oai_token-act-pair · claude-4-5-sonnet Triggered by @jyhe0408

sentence-boundary punctuation and transitional discourse markers that signal shifts, contrast, or emphasis in the discourse.

oai_token-act-pair · gpt-5 Triggered by @jyhe0408

New Auto-Interp

Configuration

google/gemma-scope-2-12b-pt/resid_post/layer_24_width_16k_l0_medium

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 podían

0.78

 многочис

0.75

 كانوا

0.69

多くの

0.68

新たな

0.68

étaient

0.67

 많은

0.66

 ہمیں

0.66

面白

0.66

 Рассмотрим

0.66

POSITIVE LOGITS

 myself

1.27

但我

1.16

 minhas

1.15

 estoy

1.12

我有

1.10

私は

1.09

我会

1.08

但是我

1.08

我不

1.07

 tengo

1.06

Activations Density 0.438%

refusing harmful or explicit content

The neuron is highly active on punctuation (e.g. semicolons, parentheses) and on discourse‐marker words (e.g. “but,” “however,” “etc.”), i.e. tokens that structure and link clauses.

periods at the end of sentences.

sentence-boundary punctuation and transitional discourse markers that signal shifts, contrast, or emphasis in the discourse.

No Comments

No Known Activations

refusing harmful or explicit content

The neuron is highly active on punctuation (e.g. semicolons, parentheses) and on discourse‐marker words (e.g. “but,” “however,” “etc.”), i.e. tokens that structure and link clauses.

periods at the end of sentences.

sentence-boundary punctuation and transitional discourse markers that signal shifts, contrast, or emphasis in the discourse.

No Comments

No Known Activations