INDEX

Explanations

talent

np_max-act · gemini-2.0-flash

phrases related to talent and professionalism.

oai_token-act-pair · gpt-4o-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 breaking

-0.06

Broken

-0.06

broken

-0.06

.Room

-0.06

 BaseModel

-0.06

 safe

-0.06

 Büyük

-0.06

Chr

-0.06

 Becker

-0.06

§ظ

-0.05

POSITIVE LOGITS

 talent

0.16

 talents

0.15

 Talent

0.13

 talented

0.12

Tal

0.10

tal

0.09

人才

0.08

 gifted

0.08

andise

0.08

才

0.08

Activations Density 0.005%

talent

phrases related to talent and professionalism.

No Comments

No Known Activations

talent

phrases related to talent and professionalism.

No Comments

No Known Activations