INDEX

Explanations

bel

np_max-act-logits · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_27/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.27.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Restrictions

-0.07

Become

-0.07

 खबर

-0.07

Criteria

-0.07

 AssertionError

-0.06

 clientes

-0.06

 نزدیک

-0.06

_strerror

-0.06

 이상

-0.06

ThanOrEqualTo

-0.06

POSITIVE LOGITS

 się

0.07

ellig

0.07

élé

0.07

.AD

0.07

貝

0.07

จร

0.07

 Dolphin

0.06

 maxWidth

0.06

 гем

0.06

BAR

0.06

Activations Density 0.006%

bel

No Comments

No Known Activations

bel

No Comments

No Known Activations