INDEX

Explanations

references to sins or sinful behavior

oai_token-act-pair · gpt-3.5-turbo

New Auto-Interp

Top Features by Cosine Similarity

Comparing With GPT2-SMALL @ 0-res-jb

Configuration

jbloom/GPT2-Small-SAEs-Reformatted/blocks.0.hook_resid_pre

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

Skylion007/openwebtext

Features

24,576

Data Type

torch.float32

Hook Point

blocks.0.hook_resid_pre

Architecture

standard

Context Size

128

Dataset

Skylion007/openwebtext

Hook Point Layer

Activation Function

relu

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Explorer

-0.68

 Mechanical

-0.66

upp

-0.64

 Booster

-0.64

 Train

-0.64

rew

-0.63

ETF

-0.63

 Uzbek

-0.62

unda

-0.62

usters

-0.61

POSITIVE LOGITS

sin

3.94

 sins

2.39

sin

2.29

Sin

2.02

 Sins

2.00

 sinful

1.95

Sin

1.83

 sinners

1.83

 transgress

1.16

 righteousness

1.15

Activations Density 0.028%

references to sins or sinful behavior

No Comments

No Known Activations

references to sins or sinful behavior

No Comments

No Known Activations