INDEX

Explanations

first-person statements containing caveats or conditions

oai_token-act-pair · gemini-2.0-flash

variations of the word "this"

np_token-act-pair-logits · gpt-4o-mini

"this" in different languages

np_max-act-logits · gemini-2.0-flash

New Auto-Interp

Configuration

google/gemma-scope-2b-pt-transcoders/layer_21/width_16k/average_l0_13

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Features

16,384

Data Type

float32

Hook Name

blocks.21.ln2.hook_normalized

Architecture

jumprelu_transcoder

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

way

-0.47

它们

-0.40

它們

-0.40

getValueAt

-0.37

 يتيمه

-0.36

 Here

-0.35

respectively

-0.35

way

-0.33

itatea

-0.33

idade

-0.32

POSITIVE LOGITS

 this

2.89

this

2.13

 questo

1.73

 этого

1.71

 THIS

1.66

 questa

1.55

 هذا

1.52

 этот

1.52

 diesem

1.51

 этом

1.51

Activations Density 7.547%

first-person statements containing caveats or conditions

variations of the word "this"

"this" in different languages

No Comments

No Known Activations

first-person statements containing caveats or conditions

variations of the word "this"

"this" in different languages

No Comments

No Known Activations