INDEX

Explanations

references to hacking or cyber security breaches

oai_token-act-pair · gpt-3.5-turbo

occurrences of the word "hacked."

oai_token-act-pair · gpt-4o-mini Triggered by @bot

New Auto-Interp

Top Features by Cosine Similarity

Comparing With GPT2-SMALL @ 1-res-jb

Configuration

jbloom/GPT2-Small-SAEs-Reformatted/blocks.1.hook_resid_pre

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

Skylion007/openwebtext

Features

24,576

Data Type

torch.float32

Hook Point

blocks.1.hook_resid_pre

Architecture

standard

Context Size

128

Dataset

Skylion007/openwebtext

Hook Point Layer

Activation Function

relu

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

bu

-0.73

 Archdemon

-0.71

 Duty

-0.66

BuyableInstoreAndOnline

-0.66

 Veter

-0.66

âĢ¢âĢ¢âĢ¢âĢ¢

-0.66

aver

-0.64

Family

-0.63

xious

-0.62

 Liber

-0.62

POSITIVE LOGITS

 hacked

1.21

 hack

0.92

 hacks

0.84

ileaks

0.83

nesday

0.80

 hacking

0.79

 faked

0.76

intosh

0.75

ividual

0.74

 stolen

0.73

Activations Density 0.011%

references to hacking or cyber security breaches

occurrences of the word "hacked."

No Comments

No Known Activations