INDEX

Explanations

adjectives describing a negative or unpleasant situation

oai_token-act-pair · gpt-3.5-turbo

words associated with a bleak or serious tone

oai_token-act-pair · gpt-4o-mini Triggered by @bot

New Auto-Interp

Top Features by Cosine Similarity

Comparing With GPT2-SMALL @ 0-res-jb

Configuration

jbloom/GPT2-Small-SAEs-Reformatted/blocks.0.hook_resid_pre

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

Skylion007/openwebtext

Features

24,576

Data Type

torch.float32

Hook Point

blocks.0.hook_resid_pre

Architecture

standard

Context Size

128

Dataset

Skylion007/openwebtext

Hook Point Layer

Activation Function

relu

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Promotion

-0.72

division

-0.72

bats

-0.72

leave

-0.71

deals

-0.71

 divor

-0.69

flo

-0.69

 Seym

-0.67

 indu

-0.67

oi

-0.66

POSITIVE LOGITS

 grim

2.94

 Grim

1.87

grim

1.57

 gruesome

1.35

 horrifying

1.01

 improvised

0.99

 harrowing

0.95

 horrific

0.95

 grun

0.94

 grit

0.92

Activations Density 0.034%

adjectives describing a negative or unpleasant situation

words associated with a bleak or serious tone

No Comments

No Known Activations

adjectives describing a negative or unpleasant situation

words associated with a bleak or serious tone

No Comments

No Known Activations