INDEX

Explanations

I

np_max-act · gemini-2.0-flash

first-person self-referential pronouns in user questions (e.g., the speaker referring to themselves with I/me/my).

oai_token-act-pair · gpt-5 Triggered by @vetterc0

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_7/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.7.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

kb

-0.06

 уровне

-0.06

 lately

-0.06

 scholarships

-0.06

ре

-0.06

 cursos

-0.06

��

-0.06

printer

-0.06

 нос

-0.06

=result

-0.06

POSITIVE LOGITS

izzle

0.08

setq

0.07

 растений

0.07

incerely

0.07

 thay

0.07

Flush

0.06

 orch

0.06

getParameter

0.06

("{\"

0.06

.retrieve

0.06

Activations Density 0.025%

I

first-person self-referential pronouns in user questions (e.g., the speaker referring to themselves with I/me/my).

No Comments

No Known Activations

I

first-person self-referential pronouns in user questions (e.g., the speaker referring to themselves with I/me/my).

No Comments

No Known Activations