INDEX

Explanations

references to strikes or violent actions and their implications

oai_token-act-pair · gpt-4o-mini Triggered by @bot

New Auto-Interp

Configuration

ctigges/pythia-70m-deduped__mlp-sm_processed/2-mlp-sm

Prompts (Dashboard)

32,768 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Features

32,768

Data Type

torch.float32

Hook Name

blocks.2.hook_mlp_out

Hook Layer

Architecture

standard

Context Size

128

Dataset

EleutherAI/the_pile_deduplicated

Activation Function

relu

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

piece

-1.76

pieces

-1.62

ality

-1.55

ffer

-1.54

market

-1.51

 compared

-1.51

ifies

-1.48

 Owner

-1.46

xico

-1.44

iat

-1.41

POSITIVE LOGITS

cy

1.64

 strict

1.60

cens

1.59

ĨĴ

1.55

ĻĤ

1.52

 violet

1.51

 thrill

1.47

eric

1.43

 paran

1.42

 solitary

1.42

Activations Density 2.170%

references to strikes or violent actions and their implications

No Comments

No Known Activations

references to strikes or violent actions and their implications

No Comments

No Known Activations