INDEX

Explanations

as

np_max-act-logits · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_23/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.23.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

Το

-0.07

	initial

-0.07

 Lebens

-0.07

 Mech

-0.07

 chilly

-0.06

 deactivated

-0.06

icap

-0.06

 dados

-0.06

 gigantic

-0.06

 yelled

-0.06

POSITIVE LOGITS

_AUT

0.07

 brat

0.06

-char

0.06

Poe

0.06

 scri

0.06

 BOOL

0.06

 сос

0.06

 reminded

0.06

Browsable

0.05

 Prev

0.05

Activations Density 0.249%

as

No Comments

No Known Activations