INDEX

Explanations

Avoiding negative behavior

np_max-act-logits · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_27/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.27.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 saga

-0.07

 posi

-0.07

 Thesis

-0.07

sta

-0.07

ikk

-0.07

 Quincy

-0.06

 census

-0.06

्बर

-0.06

の

-0.06

iations

-0.06

POSITIVE LOGITS

 BigDecimal

0.06

ऊ

0.06

 khuyến

0.06

latable

0.06

----------------------------------------------------------------------↵

0.06

šk

0.06

")[

0.06

grupo

0.06

 postData

0.06

INCT

0.06

Activations Density 0.058%

Avoiding negative behavior

No Comments

No Known Activations