INDEX

Explanations

No Explanations Found

New Auto-Interp

Configuration

andyrdt/saes-qwen2.5-7b-instruct/resid_post_layer_15/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.15.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

+=

-0.08

[r

-0.07

 ()=>{↵

-0.07

崴

-0.07

⯈

-0.07

เทศ

-0.07

 bombings

-0.06

nye

-0.06

♁

-0.06

ሲ

-0.06

POSITIVE LOGITS

 Herald

0.08

散热

0.08

 neighbour

0.07

 הנוכ

0.07

 electrodes

0.07

keys

0.07

Sequential

0.07

Cell

0.07

知识分子

0.07

 substitution

0.07

Activations Density 0.006%

No Comments

No Known Activations