INDEX

Explanations

@

np_max-act-logits · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_23/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.23.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 copyrighted

-0.07

 Amend

-0.07

 Terr

-0.07

 Sour

-0.06

 getContentPane

-0.06

 περί

-0.06

RR

-0.06

 películ

-0.06

.have

-0.06

 predatory

-0.06

POSITIVE LOGITS

UpDown

0.07

Else

0.07

je

0.07

ğiniz

0.07

 yeniden

0.06

 zápas

0.06

.Location

0.06

]*)

0.06

seys

0.06

 srov

0.06

Activations Density 0.004%

@

No Comments

No Known Activations