INDEX

Explanations

No Explanations Found

New Auto-Interp

Configuration

andyrdt/saes-qwen2.5-7b-instruct/resid_post_layer_23/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.23.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

护身

-0.08

(bc

-0.07

acier

-0.07

<num

-0.07

(mi

-0.07

mak

-0.07

<boost

-0.06

 dirt

-0.06

 Drugs

-0.06

馃

-0.06

POSITIVE LOGITS

非常喜欢

0.08

endency

0.07

 brawl

0.07

 comfortably

0.07

下调

0.07

切れ

0.07

 undesirable

0.07

банк

0.07

 bankruptcy

0.07

.LayoutStyle

0.07

Activations Density 0.003%

No Comments

No Known Activations

No Comments

No Known Activations