INDEX

Explanations

political terms related to power dynamics

oai_token-act-pair · gpt-3.5-turbo

New Auto-Interp

Configuration

neuronpedia/gpt2-small__res_scl-ajt/6-res_scl-ajt

Prompts (Dashboard)

12,288 prompts, 128 tokens each

Dataset (Dashboard)

Skylion007/openwebtext

Features

46,080

Data Type

torch.float32

Hook Point

blocks.6.hook_resid_pre

Architecture

standard

Context Size

128

Dataset

apollo-research/Skylion007-openwebtext-tokenizer-gpt2

Hook Point Layer

Activation Function

relu

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

minist

-0.41

itas

-0.38

 Hurricanes

-0.38

 constit

-0.35

âĢ¢âĢ¢âĢ¢âĢ¢

-0.33

ILY

-0.32

hur

-0.32

 Jonah

-0.31

 Prometheus

-0.31

pend

-0.31

POSITIVE LOGITS

 Lobby

0.47

heid

0.47

glers

0.45

ranch

0.45

bie

0.43

abies

0.42

brew

0.42

anova

0.41

gee

0.41

ibaba

0.40

Activations Density 11.049%

political terms related to power dynamics

No Comments

No Known Activations

political terms related to power dynamics

No Comments

No Known Activations