INDEX

Explanations

concept definition structure

np_acts-logits-general · gemini-2.5-flash-lite

The neuron fires strongly on standalone content‐word tokens that tend to appear as list items or section headings (e.g. “list,” “comics,” “overall,” “according,” “cascading,” “effect,” “dates,” etc.).

oai_token-act-pair · o4-mini Triggered by @jyhe0408

periods at the end of sentences or list items in structured text formats.

oai_token-act-pair · claude-4-5-sonnet Triggered by @jyhe0408

language describing ordering and structured arrangement of items or actions, such as sequences, lists, organization, and procedural flow.

oai_token-act-pair · gpt-5 Triggered by @jyhe0408

New Auto-Interp

Configuration

google/gemma-scope-2-12b-pt/resid_post/layer_24_width_16k_l0_medium

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits



0.71

 inasmuch

0.70

0.60

 satisfactorily

0.59

；

0.58

 farther

0.56

子

0.56

0.55

。

0.55

――

0.54

POSITIVE LOGITS

 rebranded

0.82

 geopolitical

0.78

 impactful

0.77

 disinformation

0.75

 reimag

0.75

 viewership

0.75

 geopol

0.74

 metaverse

0.73

 livestream

0.73

 chiamato

0.71

Activations Density 0.070%

concept definition structure

The neuron fires strongly on standalone content‐word tokens that tend to appear as list items or section headings (e.g. “list,” “comics,” “overall,” “according,” “cascading,” “effect,” “dates,” etc.).

periods at the end of sentences or list items in structured text formats.

language describing ordering and structured arrangement of items or actions, such as sequences, lists, organization, and procedural flow.

No Comments

No Known Activations

concept definition structure

The neuron fires strongly on standalone content‐word tokens that tend to appear as list items or section headings (e.g. “list,” “comics,” “overall,” “according,” “cascading,” “effect,” “dates,” etc.).

periods at the end of sentences or list items in structured text formats.

language describing ordering and structured arrangement of items or actions, such as sequences, lists, organization, and procedural flow.

No Comments

No Known Activations