INDEX

Explanations

names or terms related to political figures or events

oai_token-act-pair · gpt-3.5-turbo

New Auto-Interp

Configuration

neuronpedia/gpt2-small__res_scefr-ajt/6-res_scefr-ajt

Prompts (Dashboard)

12,288 prompts, 128 tokens each

Dataset (Dashboard)

Skylion007/openwebtext

Features

46,080

Data Type

torch.float32

Hook Point

blocks.6.hook_resid_pre

Architecture

standard

Context Size

128

Dataset

apollo-research/Skylion007-openwebtext-tokenizer-gpt2

Hook Point Layer

Activation Function

relu

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

sburgh

-1.01

ateral

-0.92

ships

-0.85

suit

-0.84

å£«

-0.82

ipop

-0.79

earable

-0.78

dress

-0.78

itarian

-0.78

irtual

-0.77

POSITIVE LOGITS

rique

1.16

rament

1.06

cest

1.03

rier

1.00

ces

0.96

rous

0.95

ris

0.94

riers

0.92

acle

0.92

rel

0.91

Activations Density 2.334%

names or terms related to political figures or events

No Comments

No Known Activations