INDEX

Explanations

phrases indicating personal experiences or interactions

oai_token-act-pair · gpt-3.5-turbo

New Auto-Interp

Top Features by Cosine Similarity

Comparing With GPT2-SMALL @ 10-res-jb

Configuration

jbloom/GPT2-Small-SAEs-Reformatted/blocks.10.hook_resid_pre

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

Skylion007/openwebtext

Features

24,576

Data Type

torch.float32

Hook Point

blocks.10.hook_resid_pre

Architecture

standard

Context Size

128

Dataset

Skylion007/openwebtext

Hook Point Layer

Activation Function

relu

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 accordingly

-0.65

lying

-0.65

arry

-0.64

interstitial

-0.64

umbing

-0.63

peak

-0.62

iter

-0.62

anwhile

-0.62

 thats

-0.61

annel

-0.60

POSITIVE LOGITS

 opportunity

1.39

 privilege

1.28

 misfortune

1.21

 pleasure

1.13

 courage

1.07

 chance

1.06

 option

1.03

 utmost

0.99

 displeasure

0.97

 guts

0.97

Activations Density 0.097%

phrases indicating personal experiences or interactions

No Comments

No Known Activations

phrases indicating personal experiences or interactions

No Comments

No Known Activations