INDEX

Explanations

numbers in text

np_max-act-logits · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-qwen2.5-7b-instruct/resid_post_layer_19/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.19.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

Generally

-0.07

วก

-0.07

 Additionally

-0.07

掴

-0.07

 fflush

-0.07

 łazienk

-0.07

넙

-0.07

┷

-0.07

뉠

-0.07

.SelectSingleNode

-0.07

POSITIVE LOGITS

交接

0.07

字段

0.07

 والإ

0.07

 continental

0.07

 motorcycle

0.07

 memoir

0.07

 organic

0.06

伦敦

0.06

Franc

0.06

 powerhouse

0.06

Activations Density 0.150%

numbers in text

No Comments

No Known Activations