INDEX

Explanations

potentially coercive, influence writing, interpreted as

np_acts-logits-general · gemini-2.5-flash-lite

The neuron fires on structural or metadata tokens—things like section headings, numbered list titles, citation entries, dates, and other document‐formatting markers.

oai_token-act-pair · o4-mini Triggered by @jyhe0408

very high numerical values, especially large numbers in the hundreds.

oai_token-act-pair · claude-4-5-sonnet Triggered by @jyhe0408

document headings, headlines, and bibliographic/reference metadata within nonfiction text.

oai_token-act-pair · gpt-5 Triggered by @jyhe0408

New Auto-Interp

Configuration

google/gemma-scope-2-12b-pt/resid_post/layer_24_width_16k_l0_medium

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 lysate

0.93

 Sheng

0.93

 rinsing

0.93

 rationalize

0.92

 detergents

0.90

 camphor

0.90

тной

0.89

 Río

0.89

 enteric

0.88

 carnivorous

0.87

POSITIVE LOGITS

1.05

0.83

ui

0.82

👇

0.81

ob

0.79

un

0.78

no

0.77

에

0.76

ug

0.76

na

0.76

Activations Density 0.001%

potentially coercive, influence writing, interpreted as

The neuron fires on structural or metadata tokens—things like section headings, numbered list titles, citation entries, dates, and other document‐formatting markers.

very high numerical values, especially large numbers in the hundreds.

document headings, headlines, and bibliographic/reference metadata within nonfiction text.

No Comments

No Known Activations

potentially coercive, influence writing, interpreted as

The neuron fires on structural or metadata tokens—things like section headings, numbered list titles, citation entries, dates, and other document‐formatting markers.

very high numerical values, especially large numbers in the hundreds.

document headings, headlines, and bibliographic/reference metadata within nonfiction text.

No Comments

No Known Activations