INDEX

Explanations

us

np_max-act · gemini-2.0-flash

This neuron fires on self-referential promotional language—mentions of the program or organization’s own name, “we/our,” and direct calls to action like “contact us” or “let us know.”

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

_SET

-0.07

Generator

-0.07

PARSE

-0.07

ppard

-0.06

 epsilon

-0.06

istani

-0.06

LIKELY

-0.06

IsRequired

-0.06

 Shared

-0.06

 Sinh

-0.06

POSITIVE LOGITS

 {↵↵↵↵

0.08

 збір

0.07

{↵↵↵

0.07

 =>{↵

0.07

;
↵
↵
↵

0.06

);
↵
↵
↵

0.06

 //
↵
↵

0.06

(coll

0.06

toHave

0.06

"}}>↵

0.06

Activations Density 0.061%

us

This neuron fires on self-referential promotional language—mentions of the program or organization’s own name, “we/our,” and direct calls to action like “contact us” or “let us know.”

No Comments

No Known Activations