INDEX

Explanations

community, rights, faith, health

np_max-act-logits · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-qwen2.5-7b-instruct/resid_post_layer_19/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.19.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

院院士

-0.07

-bedroom

-0.07

%=

-0.07

颞

-0.06

({})↵

-0.06

 empath

-0.06

 welcoming

-0.06

Portland

-0.06

焊

-0.06

 volatile

-0.06

POSITIVE LOGITS

_album

0.08

后果

0.07

_OVER

0.07

Ctr

0.07

.Menu

0.07

 ACTION

0.07

music

0.07

	packet

0.07

BLE

0.06

︺

0.06

Activations Density 0.751%

community, rights, faith, health

No Comments

No Known Activations