INDEX

Explanations

common English tokens

np_max-act · gemini-2.0-flash

The neuron fires on the first content words that open a new response or section—i.e. sentence-initial or discourse-marker tokens like “It,” “This,” “Tengo,” etc.

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

_far

-0.07

ính

-0.06

urious

-0.06

 barr

-0.06

 Země

-0.06

 Curtain

-0.06

 bottoms

-0.06

 letting

-0.06

 única

-0.06

 BOTTOM

-0.06

POSITIVE LOGITS

Deserializer

0.07

	record

0.07

.isSelected

0.06

andFilterWhere

0.06

xab

0.06

 positively

0.06

 extensive

0.06

ög

0.06

 olduğ

0.06

 psychological

0.06

Activations Density 0.228%

common English tokens

The neuron fires on the first content words that open a new response or section—i.e. sentence-initial or discourse-marker tokens like “It,” “This,” “Tengo,” etc.

No Comments

No Known Activations