INDEX

Explanations

Rory, drunk, Russ, GMO

np_acts-logits-general · gemini-2.5-flash-lite

The neuron primarily detects the word "drunk" and its variations, especially in contexts related to intoxication.

oai_token-act-pair · claude-3-7-sonnet-20250219 Triggered by @neilrathi

This neuron is essentially a “memorization” detector that spikes specifically on the tokens “Rory” and “Drunk.”

oai_token-act-pair · o4-mini Triggered by @jyhe0408

New Auto-Interp

Configuration

google/gemma-scope-27b-pt-res/layer_10/width_131k

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits



-2.89

汭

-2.56

呖

-2.53

荭

-2.48

↵

-2.39

ஂ

-2.36

しています

-2.34

 белая

-2.33

 blir

-2.31

轵

-2.31

POSITIVE LOGITS

3.05

ing

2.80

嫆

2.66

唢

2.52

 verlangt

2.48

OUTUBE

2.45

 grünen

2.44

 ewigen

2.38

2.36

做的

2.34

Activations Density 0.004%

Rory, drunk, Russ, GMO

The neuron primarily detects the word "drunk" and its variations, especially in contexts related to intoxication.

This neuron is essentially a “memorization” detector that spikes specifically on the tokens “Rory” and “Drunk.”

No Comments

No Known Activations

Rory, drunk, Russ, GMO

The neuron primarily detects the word "drunk" and its variations, especially in contexts related to intoxication.

This neuron is essentially a “memorization” detector that spikes specifically on the tokens “Rory” and “Drunk.”

No Comments

No Known Activations