INDEX

Explanations

Shortcomings and failures

np_max-act-logits · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_27/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.27.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 utterly

-0.07

sho

-0.07

خی

-0.07

_qs

-0.07

 bức

-0.07

 invoked

-0.06

INV

-0.06

探

-0.06

qv

-0.06

劳

-0.06

POSITIVE LOGITS

_pointer

0.06

"))
↵

0.06

braska

0.05

Beans

0.05

%;
↵

0.05

ремя

0.05

')")↵

0.05

 takeover

0.05

 }?>↵

0.05

 UserService

0.05

Activations Density 0.116%

Shortcomings and failures

No Comments

No Known Activations

Shortcomings and failures

No Comments

No Known Activations