INDEX

Explanations

instances of the word "approved" related to various contexts

oai_token-act-pair · gpt-4o-mini Triggered by @bot

New Auto-Interp

Configuration

google/gemma-scope-9b-pt-mlp/layer_6/width_131k/average_l0_72

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Features

131,072

Data Type

float32

Hook Name

blocks.6.hook_mlp_out

Hook Layer

Architecture

jumprelu

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Activation Function

relu

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 approved

-1.85

approved

-1.66

 Approved

-1.63

Approved

-1.52

 APPROVED

-1.48

APPROVED

-1.20

 aprobado

-1.19

 approve

-1.15

 approves

-1.14

 approu

-1.12

POSITIVE LOGITS

 ngang

0.37

 near

0.27

ẹp

0.27

 مل

0.27

flä

0.26

 fair

0.26

 Mili

0.26

 facie

0.25

購

0.25

 guste

0.25

Activations Density 0.004%

instances of the word "approved" related to various contexts

No Comments

No Known Activations

instances of the word "approved" related to various contexts

No Comments

No Known Activations