INDEX

Explanations

Self-references

np_max-act-logits · gemini-2.0-flash

The neuron detects first-person self-reference (speaker-focused pronouns and constructions indicating "I"/the narrator).

oai_token-act-pair · gpt-5-mini Triggered by @yooniel31

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_27/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.27.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

illustr

-0.07

ポート

-0.07

 addslashes

-0.07

GRADE

-0.07

.kind

-0.06

ducation

-0.06

 Values

-0.06

ArgumentNullException

-0.06

 caster

-0.06

.Cascade

-0.06

POSITIVE LOGITS

 रन

0.07

/event

0.06

「你

0.06

’yi

0.06

ubo

0.06

 zákona

0.06

=session

0.06

 добав

0.06

nerRadius

0.06

Activations Density 0.050%

Self-references

The neuron detects first-person self-reference (speaker-focused pronouns and constructions indicating "I"/the narrator).

No Comments

No Known Activations