INDEX

Explanations

phrases related to getting away with something

oai_token-act-pair · gpt-3.5-turbo

New Auto-Interp

Configuration

neuronpedia/gpt2-small__res_scefr-ajt/6-res_scefr-ajt

Prompts (Dashboard)

12,288 prompts, 128 tokens each

Dataset (Dashboard)

Skylion007/openwebtext

Features

46,080

Data Type

torch.float32

Hook Point

blocks.6.hook_resid_pre

Architecture

standard

Context Size

128

Dataset

apollo-research/Skylion007-openwebtext-tokenizer-gpt2

Hook Point Layer

Activation Function

relu

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ural

-0.36

ials

-0.34

Uz

-0.33

azel

-0.33

 Yamato

-0.32

ulum

-0.32

snap

-0.31

Emer

-0.31

hops

-0.31

hari

-0.30

POSITIVE LOGITS

 safely

0.41

 Vend

0.32

 Horses

0.31

 Solitaire

0.30

 Ride

0.30

ARM

0.30

 Riding

0.30

 horses

0.29

©¶æ

0.29

Veh

0.29

Activations Density 0.050%

phrases related to getting away with something

No Comments

No Known Activations

phrases related to getting away with something

No Comments

No Known Activations