INDEX

Explanations

the word "plus" along with a numerical value, potentially indicating a positive association or addition

oai_token-act-pair · gpt-3.5-turbo

phrases indicating the addition or accumulation of quantities

oai_token-act-pair · gpt-4o-mini Triggered by @bot

New Auto-Interp

Top Features by Cosine Similarity

Comparing With GPT2-SMALL @ 0-res-jb

Configuration

jbloom/GPT2-Small-SAEs-Reformatted/blocks.0.hook_resid_pre

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

Skylion007/openwebtext

Features

24,576

Data Type

torch.float32

Hook Point

blocks.0.hook_resid_pre

Architecture

standard

Context Size

128

Dataset

Skylion007/openwebtext

Hook Point Layer

Activation Function

relu

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

Cry

-0.75

ãĤ¶

-0.65

urg

-0.62

ami

-0.61

hap

-0.61

ammers

-0.60

robe

-0.59

anes

-0.59

terness

-0.58

DEBUG

-0.57

POSITIVE LOGITS

 plus

3.76

 minus

2.51

plus

2.30

 PLUS

2.25

 Plus

2.03

minus

1.97

Plus

1.75

1.36

1.29

 combined

1.23

Activations Density 0.014%

the word "plus" along with a numerical value, potentially indicating a positive association or addition

phrases indicating the addition or accumulation of quantities

No Comments

No Known Activations