INDEX

Explanations

not followed by descriptive words

np_acts-logits-general · gemini-2.5-flash-lite

The neuron activates on negation words (e.g., “not,” “don’t,” “isn’t”).

oai_token-act-pair · o4-mini Triggered by @jyhe0408

explicit grammatical negation in sentences.

oai_token-act-pair · gpt-5 Triggered by @jyhe0408

the word "not" when it appears in negative statements or denials.

oai_token-act-pair · claude-4-5-sonnet Triggered by @jyhe0408

New Auto-Interp

Configuration

google/gemma-scope-27b-pt-res/layer_10/width_131k

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ᚔ

-2.05

 appris

-1.90

).}

-1.76

僦

-1.72

ꪻ

-1.62

 hés

-1.61

有一定的

-1.59

θλη

-1.59

硨

-1.59

 phénom

-1.57

POSITIVE LOGITS

 nawet

2.17

 even

1.95

 даже

1.76

1.65

and

1.61

這

1.60

 sogar

1.58

 with

1.51

for

1.50

 этого

1.50

Activations Density 0.016%

not followed by descriptive words

The neuron activates on negation words (e.g., “not,” “don’t,” “isn’t”).

explicit grammatical negation in sentences.

the word "not" when it appears in negative statements or denials.

No Comments

No Known Activations

not followed by descriptive words

The neuron activates on negation words (e.g., “not,” “don’t,” “isn’t”).

explicit grammatical negation in sentences.

the word "not" when it appears in negative statements or denials.

No Comments

No Known Activations