INDEX

Explanations

references to specific animals, such as dwarves, elephants, tigers, and monkeys

oai_token-act-pair · gpt-3.5-turbo

references to fantasy or mythical creatures, animals, and characters

oai_token-act-pair · gpt-4o-mini Triggered by @bot

New Auto-Interp

Top Features by Cosine Similarity

Comparing With GPT2-SMALL @ 1-res-jb

Configuration

jbloom/GPT2-Small-SAEs-Reformatted/blocks.1.hook_resid_pre

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

Skylion007/openwebtext

Features

24,576

Data Type

torch.float32

Hook Point

blocks.1.hook_resid_pre

Architecture

standard

Context Size

128

Dataset

Skylion007/openwebtext

Hook Point Layer

Activation Function

relu

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

nce

-0.70

aton

-0.69

NC

-0.69

wise

-0.68

Asset

-0.68

IP

-0.67

ty

-0.67

lic

-0.66

York

-0.66

POSITIVE LOGITS

aurus

1.21

hip

1.17

ervatives

1.12

mith

1.06

ongs

1.04

uggest

1.01

terday

1.00

paces

1.00

agascar

0.99

ettings

0.97

Activations Density 0.096%

references to specific animals, such as dwarves, elephants, tigers, and monkeys

references to fantasy or mythical creatures, animals, and characters

No Comments

No Known Activations