INDEX

Explanations

Russian code-related

np_max-act-logits · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-qwen2.5-7b-instruct/resid_post_layer_19/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.19.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

decorators

-0.07

 disciplined

-0.07

WINDOWS

-0.07

옳

-0.07

//-----------------------------------------------------------------------------↵

-0.07

YL

-0.07

спект

-0.07

/R

-0.07

זכ

-0.06

新手

-0.06

POSITIVE LOGITS

-con

0.07

推送

0.07

表面

0.07

 happened

0.06

tron

0.06

 much

0.06

宫

0.06

棒

0.06

光芒

0.06

be

0.06

Activations Density 0.112%

Russian code-related

No Comments

No Known Activations