INDEX

Explanations

testing expected output

np_max-act · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-qwen2.5-7b-instruct/resid_post_layer_15/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.15.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

aux

-0.07

Interview

-0.07

irie

-0.07

用力

-0.07

 Attack

-0.07

Taking

-0.07

dux

-0.07

 이름

-0.07

 meilleur

-0.07

ervention

-0.06

POSITIVE LOGITS

 worksheets

0.07

zk

0.07

CED

0.07

_DST

0.07

 semanas

0.07

 cerc

0.06

生育

0.06

nf

0.06

oralType

0.06

他还

0.06

Activations Density 0.063%

testing expected output

No Comments

No Known Activations

testing expected output

No Comments

No Known Activations