INDEX

Explanations

hills

np_max-act · gemini-2.0-flash

The neuron detects descriptive mentions of undulating landscape features, especially “rolling hills.”

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

Sure

-0.07

 Insider

-0.07

edit

-0.06

ists

-0.06

 harm

-0.06

'],'

-0.06

 Fried

-0.06

IRECT

-0.06

 spheres

-0.06

-course

-0.06

POSITIVE LOGITS

 Episode

0.07

gil

0.07

 dolphins

0.07

iless

0.07

optgroup

0.06

 Poetry

0.06

 pockets

0.06

birthday

0.06

营业

0.06

attle

0.06

Activations Density 0.005%

hills

The neuron detects descriptive mentions of undulating landscape features, especially “rolling hills.”

No Comments

No Known Activations