INDEX

Explanations

statements of responsibility or attribution for certain actions or situations

oai_token-act-pair · gpt-3.5-turbo

New Auto-Interp

Configuration

neuronpedia/gpt2-small__res_scl-ajt/6-res_scl-ajt

Prompts (Dashboard)

12,288 prompts, 128 tokens each

Dataset (Dashboard)

Skylion007/openwebtext

Features

46,080

Data Type

torch.float32

Hook Point

blocks.6.hook_resid_pre

Architecture

standard

Context Size

128

Dataset

apollo-research/Skylion007-openwebtext-tokenizer-gpt2

Hook Point Layer

Activation Function

relu

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

quart

-0.82

ylon

-0.79

frey

-0.75

tering

-0.74

TERN

-0.73

cher

-0.73

chers

-0.72

zig

-0.70

ilet

-0.70

mare

-0.69

POSITIVE LOGITS

 citiz

0.98

Ohio

0.86

 stewards

0.80

orate

0.79

 mischief

0.76

responsible

0.75

axter

0.75

 compe

0.74

 explan

0.73

 behav

0.73

Activations Density 5.897%

statements of responsibility or attribution for certain actions or situations

No Comments

No Known Activations

statements of responsibility or attribution for certain actions or situations

No Comments

No Known Activations