INDEX

Explanations

instances of sexual assault and related violent actions

oai_token-act-pair · gpt-3.5-turbo

New Auto-Interp

Configuration

neuronpedia/gpt2-small__res_scl-ajt/6-res_scl-ajt

Prompts (Dashboard)

12,288 prompts, 128 tokens each

Dataset (Dashboard)

Skylion007/openwebtext

Features

46,080

Data Type

torch.float32

Hook Point

blocks.6.hook_resid_pre

Architecture

standard

Context Size

128

Dataset

apollo-research/Skylion007-openwebtext-tokenizer-gpt2

Hook Point Layer

Activation Function

relu

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

issue

-0.32

immune

-0.32

issues

-0.32

hire

-0.30

fare

-0.30

repair

-0.30

tag

-0.30

fly

-0.30

gap

-0.30

bid

-0.29

POSITIVE LOGITS

 Meredith

0.30

 Samantha

0.30

 prostitutes

0.29

nesday

0.28

 TAMADRA

0.28

 raping

0.28

agog

0.27

 Paige

0.27

osexual

0.27

 innoc

0.27

Activations Density 6.307%

instances of sexual assault and related violent actions

No Comments

No Known Activations