INDEX

Explanations

ED

np_max-act · gemini-2.0-flash

The neuron strongly activates on the “ED” fragment in the uppercase heading “UNITED STATES,” effectively detecting the “UNITED STATES” label in document headings.

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

,index

-0.07

Decor

-0.07

_Box

-0.06

Dam

-0.06

 знов

-0.06

aaaaaaaa

-0.06

—for

-0.06

ioms

-0.06

orsk

-0.06

 sideways

-0.06

POSITIVE LOGITS

 United

0.17

 UNITED

0.13

United

0.12

 united

0.08

 unite

0.08

.Positive

0.07

 liên

0.07

 married

0.07

FT

0.07

UNIT

0.07

Activations Density 0.009%

ED

The neuron strongly activates on the “ED” fragment in the uppercase heading “UNITED STATES,” effectively detecting the “UNITED STATES” label in document headings.

No Comments

No Known Activations