INDEX

Explanations

phrases related to prisons, imprisonment, and criminal justice

oai_token-act-pair · gpt-3.5-turbo Triggered by @bot

New Auto-Interp

Top Features by Cosine Similarity

Comparing With GPT2-SMALL @ 8-res_fs1536-jb

Configuration

jbloom/GPT2-Small-Feature-Splitting-Experiment-Layer-8/blocks.8.hook_resid_pre_1536

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

Skylion007/openwebtext

Features

1,536

Data Type

torch.float32

Hook Point

blocks.8.hook_resid_pre

Architecture

standard

Context Size

128

Dataset

Skylion007/openwebtext

Hook Point Layer

Activation Function

relu

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

£ı

-0.80

pub

-0.71

ãĥ£

-0.65

 Cyber

-0.62

soDeliveryDate

-0.60

ergy

-0.60

 seller

-0.60

 Toast

-0.59

thora

-0.59

umers

-0.59

POSITIVE LOGITS

 inmates

1.23

 prisoners

1.11

 inmate

1.09

 confinement

1.08

 detention

1.05

 prisoner

0.97

prison

0.96

 detainees

0.96

 detain

0.96

 incarcerated

0.93

Activations Density 2.316%

phrases related to prisons, imprisonment, and criminal justice

No Comments

No Known Activations

phrases related to prisons, imprisonment, and criminal justice

No Comments

No Known Activations