INDEX

Explanations

phrases emphasizing a strong opinion or negation

oai_token-act-pair · gpt-3.5-turbo

emphatic negations and strong disclaimers

oai_token-act-pair · gpt-4o-mini Triggered by @bot

New Auto-Interp

Top Features by Cosine Similarity

Comparing With GPT2-SMALL @ 11-res-jb

Configuration

jbloom/GPT2-Small-SAEs-Reformatted/blocks.11.hook_resid_pre

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

Skylion007/openwebtext

Features

24,576

Data Type

torch.float32

Hook Point

blocks.11.hook_resid_pre

Architecture

standard

Context Size

128

Dataset

Skylion007/openwebtext

Hook Point Layer

Activation Function

relu

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

lahoma

-0.73

apters

-0.67

rity

-0.67

urer

-0.65

orio

-0.65

iary

-0.65

liest

-0.65

auga

-0.65

urers

-0.65

ariat

-0.65

POSITIVE LOGITS

LY

1.57

ALLY

1.45

THING

1.45

ELY

1.42

ONE

1.41

HO

1.37

OSE

1.37

LESS

1.36

NESS

1.36

 THERE

1.36

Activations Density 0.158%

phrases emphasizing a strong opinion or negation

emphatic negations and strong disclaimers

No Comments

No Known Activations

phrases emphasizing a strong opinion or negation

emphatic negations and strong disclaimers

No Comments

No Known Activations