INDEX

Explanations

words related to giving credit or acknowledgment

oai_token-act-pair · gpt-3.5-turbo

New Auto-Interp

Configuration

neuronpedia/gpt2-small__res_scefr-ajt/6-res_scefr-ajt

Prompts (Dashboard)

12,288 prompts, 128 tokens each

Dataset (Dashboard)

Skylion007/openwebtext

Features

46,080

Data Type

torch.float32

Hook Point

blocks.6.hook_resid_pre

Architecture

standard

Context Size

128

Dataset

apollo-research/Skylion007-openwebtext-tokenizer-gpt2

Hook Point Layer

Activation Function

relu

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

SPONSORED

-0.81

fab

-0.74

çİĭ

-0.71

STON

-0.67

ichick

-0.66

Osw

-0.65

ews

-0.65

ools

-0.65

dose

-0.63

HF

-0.63

POSITIVE LOGITS

 representation

0.62

 accuracy

0.60

 equal

0.60

 comparable

0.60

 Dying

0.57

 strictly

0.57

 understatement

0.56

 outnumbered

0.56

 reviewer

0.56

bra

0.55

Activations Density 0.106%

words related to giving credit or acknowledgment

No Comments

No Known Activations