INDEX

Explanations

names mentioned in an online conversation or forum format

oai_token-act-pair · gpt-3.5-turbo Triggered by @bot

references to dates and personal identifiers

oai_token-act-pair · gpt-4o-mini Triggered by @bot

New Auto-Interp

Top Features by Cosine Similarity

Comparing With GPT2-SMALL @ 12-res-jb

Configuration

jbloom/GPT2-Small-SAEs-Reformatted/blocks.11.hook_resid_post

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

Skylion007/openwebtext

Features

24,576

Data Type

torch.float32

Hook Point

blocks.11.hook_resid_post

Architecture

standard

Context Size

128

Dataset

Skylion007/openwebtext

Hook Point Layer

Activation Function

relu

Embeds

PlotsExplanationShow Test FieldDefault Test Text

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 ÂŃ

-2.11

–

-1.42

âĢ³

-1.40

âĢĲ

-1.29

ÂŃ

-1.17

—

-1.17

âĢ²

-1.08

 âĪĴ

-0.99

 Isis

-0.94

–

-0.92

POSITIVE LOGITS

...

2.35

,...

2.30

....

2.11

...?

2.09

..."

2.01

"...

1.99

......

1.98

.....

1.94

.......

1.92

--

1.89

Activations Density 0.467%

names mentioned in an online conversation or forum format

references to dates and personal identifiers

No Comments

No Known Activations