INDEX

Explanations

references to fantasy elements or characters

oai_token-act-pair · gpt-4o-mini Triggered by @bot

New Auto-Interp

Top Features by Cosine Similarity

Comparing With GPT2-SMALL @ 8-res_post_32k-oai

Configuration

jbloom/GPT2-Small-OAI-v5-32k-resid-post-SAEs/v5_32k_layer_8.pt

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

Skylion007/openwebtext

Features

32,768

Data Type

torch.float32

Hook Name

blocks.8.hook_resid_post

Hook Layer

Architecture

standard

Context Size

Dataset

Skylion007/openwebtext

Activation Function

topk

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 elbows

-0.83

ibaba

-0.76

 boycot

-0.73

 boycott

-0.71

 whine

-0.70

bernatorial

-0.70

jad

-0.69

Pwr

-0.68

 bandwagon

-0.68

hower

-0.68

POSITIVE LOGITS

 Mysteries

1.17

 Grimoire

1.01

 cryptic

0.97

spell

0.97

potion

0.96

 mysteries

0.95

 secrets

0.95

 Secrets

0.93

 Sorceress

0.90

 runes

0.88

Activations Density 0.068%

references to fantasy elements or characters

No Comments

No Known Activations