INDEX

Explanations

stress

np_max-act-logits · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-qwen2.5-7b-instruct/resid_post_layer_19/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.19.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 oluşturul

-0.08

 validating

-0.07

layın

-0.07

 parentheses

-0.07

 מאפשר

-0.07

das

-0.06

ירו

-0.06

脫

-0.06

举报电话

-0.06

pwd

-0.06

POSITIVE LOGITS

 =============================================================================↵

0.09

 essentials

0.07

.ExecuteReader

0.07

Uni

0.07

 Oven

0.06

ками

0.06

ۂ

0.06

ught

0.06

 Zombies

0.06

 drunk

0.06

Activations Density 0.010%

stress

No Comments

No Known Activations

stress

No Comments

No Known Activations