INDEX

Explanations

instances of the word "knew" in various contexts

oai_token-act-pair · gpt-4o-mini Triggered by @bot

knew

np_max-act-logits · gemini-2.0-flash Triggered by @johnny

New Auto-Interp

Configuration

google/gemma-scope-2b-pt-res/layer_12/width_1m/average_l0_107

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Features

1,048,576

Data Type

float32

Hook Name

blocks.12.hook_resid_post

Hook Layer

Architecture

jumprelu

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Activation Function

relu

Embeds

PlotsExplanationShow Test FieldDefault Test Text

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

modelBuilder

-0.85

ạn

-0.81

 gole

-0.80

 vzduchu

-0.80

mbor

-0.77

AQ

-0.75

 Trafalgar

-0.73

Slf

-0.72

 Вален

-0.72

火

-0.72

POSITIVE LOGITS

 knew

1.38

knew

1.28

 Knew

1.22

Twas

0.97

twas

0.96

 знал

0.94

=$?

0.91

 знали

0.89

ıyordu

0.86

랐

0.84

Activations Density 0.003%

instances of the word "knew" in various contexts

knew

No Comments

No Known Activations