INDEX

Explanations

data dictionary or list

np_acts-logits-general · gemini-2.5-flash-lite

The neuron fires on uncommon, domain‐specific tokens—that is, rare or specialized words (like “sponsored,” “vaccines,” “processed,” “txt,” “modes,” etc.).

oai_token-act-pair · o4-mini Triggered by @jyhe0408

the word "sponsored" or variations of "processed" in formal or technical contexts.

oai_token-act-pair · claude-4-5-sonnet Triggered by @jyhe0408

sections of formal web-style boilerplate or promotional/administrative copy (privacy-policy language, sponsored content notices, URLs/parentheticals) within longer texts.

oai_token-act-pair · gpt-5 Triggered by @jyhe0408

New Auto-Interp

Configuration

google/gemma-scope-2-12b-pt/resid_post/layer_24_width_16k_l0_medium

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 возникновения

0.93

ждый

0.92

っいて

0.91

жной

0.88

ườn

0.81

льной

0.81

ющей

0.80

्टी

0.80

 года

0.79

 нем

0.79

POSITIVE LOGITS

جا

0.92

da

0.82

Ön

0.80

 bağlant

0.77

Rub

0.77

ども

0.76

za

0.75

 striées

0.75

Rose

0.75

мережа

0.75

Activations Density 0.000%

data dictionary or list

The neuron fires on uncommon, domain‐specific tokens—that is, rare or specialized words (like “sponsored,” “vaccines,” “processed,” “txt,” “modes,” etc.).

the word "sponsored" or variations of "processed" in formal or technical contexts.

sections of formal web-style boilerplate or promotional/administrative copy (privacy-policy language, sponsored content notices, URLs/parentheticals) within longer texts.

No Comments

No Known Activations

data dictionary or list

The neuron fires on uncommon, domain‐specific tokens—that is, rare or specialized words (like “sponsored,” “vaccines,” “processed,” “txt,” “modes,” etc.).

the word "sponsored" or variations of "processed" in formal or technical contexts.

sections of formal web-style boilerplate or promotional/administrative copy (privacy-policy language, sponsored content notices, URLs/parentheticals) within longer texts.

No Comments

No Known Activations