INDEX

Explanations

proper names related to a specific individual, likely a celebrity or public figure, named Hendricks

oai_token-act-pair · gpt-3.5-turbo

references to a specific individual, John Hendricks, along with mentions of related terms like 'hander' and 'Mercer'

oai_token-act-pair · gpt-4o-mini Triggered by @bot

New Auto-Interp

Top Features by Cosine Similarity

Comparing With GPT2-SMALL @ 2-res-jb

Configuration

jbloom/GPT2-Small-SAEs-Reformatted/blocks.2.hook_resid_pre

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

Skylion007/openwebtext

Features

24,576

Data Type

torch.float32

Hook Point

blocks.2.hook_resid_pre

Architecture

standard

Context Size

128

Dataset

Skylion007/openwebtext

Hook Point Layer

Activation Function

relu

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ãĤ§

-0.73

ers

-0.65

ãĤ©

-0.65

angelo

-0.64

 Holo

-0.60

 supplementary

-0.59

book

-0.58

 ACTIONS

-0.58

estones

-0.57

friends

-0.57

POSITIVE LOGITS

heit

0.82

 Hendricks

0.82

hander

0.80

eton

0.78

emis

0.75

hyde

0.75

uce

0.75

alez

0.73

igham

0.71

bourg

0.71

Activations Density 0.027%

proper names related to a specific individual, likely a celebrity or public figure, named Hendricks

references to a specific individual, John Hendricks, along with mentions of related terms like 'hander' and 'Mercer'

No Comments

No Known Activations