INDEX

Explanations

Self-reference

np_max-act · gemini-2.0-flash

This neuron detects first-person self-referential words and role/identity declarations (tokens like "I", "I'm", "am" and similar self-identifying phrases).

oai_token-act-pair · gpt-5-mini Triggered by @yooniel31

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

IVING

-0.08

ario

-0.08

/list

-0.07

OVE

-0.07

IRE

-0.07

ITA

-0.07

_sphere

-0.07

ijo

-0.07

ARIO

-0.07

.Hidden

-0.06

POSITIVE LOGITS

 licz

0.07

 FIFA

0.06

];↵↵↵

0.06

να

0.06

.getApp

0.06

 neby

0.06

 essa

0.06

 porte

0.06

'nde

0.05

 pochop

0.05

Activations Density 0.175%

Self-reference

This neuron detects first-person self-referential words and role/identity declarations (tokens like "I", "I'm", "am" and similar self-identifying phrases).

No Comments

No Known Activations