INDEX

Explanations

override/overwrite

np_acts-logits-general · gemini-2.5-flash-lite

The neuron fires on technical/programming jargon and instruction‐style terms (e.g. “implementation,” “initialize,” “accessing,” “preferences”), i.e. code-related documentation language.

oai_token-act-pair · o4-mini Triggered by @jyhe0408

comments or explanatory text in programming or technical documentation.

oai_token-act-pair · claude-4-5-sonnet Triggered by @jyhe0408

descriptions of visceral physical or emotional reactions in the body.

oai_token-act-pair · gpt-5 Triggered by @jyhe0408

New Auto-Interp

Configuration

google/gemma-scope-2-12b-pt/resid_post/layer_31_width_16k_l0_medium

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

穂

0.56

 aloof

0.54

 कच्च

0.52

QueueMutex

0.51

winding

0.49

VERS

0.49

 शै

0.49

 berakhir

0.49

\!\

0.48

 réduite

0.48

POSITIVE LOGITS

Override

1.87

 override

1.87

override

1.86

 Override

1.70

 overrides

1.59

 overridden

1.59

 overriding

1.52

Overrides

1.23

overwrite

1.08

 overwrite

1.06

Activations Density 0.022%

override/overwrite

The neuron fires on technical/programming jargon and instruction‐style terms (e.g. “implementation,” “initialize,” “accessing,” “preferences”), i.e. code-related documentation language.

comments or explanatory text in programming or technical documentation.

descriptions of visceral physical or emotional reactions in the body.

No Comments

No Known Activations

override/overwrite

The neuron fires on technical/programming jargon and instruction‐style terms (e.g. “implementation,” “initialize,” “accessing,” “preferences”), i.e. code-related documentation language.

comments or explanatory text in programming or technical documentation.

descriptions of visceral physical or emotional reactions in the body.

No Comments

No Known Activations