INDEX

Explanations

expressions of praise or approval within longer text passages

oai_token-act-pair · gpt-3.5-turbo

New Auto-Interp

Configuration

neuronpedia/gpt2-small__res_scl-ajt/6-res_scl-ajt

Prompts (Dashboard)

12,288 prompts, 128 tokens each

Dataset (Dashboard)

Skylion007/openwebtext

Features

46,080

Data Type

torch.float32

Hook Point

blocks.6.hook_resid_pre

Architecture

standard

Context Size

128

Dataset

apollo-research/Skylion007-openwebtext-tokenizer-gpt2

Hook Point Layer

Activation Function

relu

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

thur

-0.63

pora

-0.60

acho

-0.58

VT

-0.56

TM

-0.54

atl

-0.53

UA

-0.52

 thor

-0.51

opers

-0.50

iframe

-0.50

POSITIVE LOGITS

nered

0.75

dden

0.67

itud

0.67

gre

0.62

structed

0.61

umenthal

0.61

rand

0.61

ubric

0.59

stood

0.56

kered

0.54

Activations Density 15.347%

expressions of praise or approval within longer text passages

No Comments

No Known Activations

expressions of praise or approval within longer text passages

No Comments

No Known Activations