INDEX

Explanations

apostrophes and their surrounding words.

oai_token-act-pair · gemini-2.0-flash

apostrophes

np_max-act-logits · gemini-2.0-flash

New Auto-Interp

Configuration

google/gemma-scope-2b-pt-transcoders/layer_20/width_16k/average_l0_11

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Features

16,384

Data Type

float32

Hook Name

blocks.20.ln2.hook_normalized

Architecture

jumprelu_transcoder

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 referenties

-0.56

 kasarigan

-0.56

culable

-0.55

cency

-0.50

fated

-0.49

зулта

-0.49

*/,

-0.47

PathVariable

-0.45

Hiya

-0.45

**)

-0.44

POSITIVE LOGITS

<bos>

0.75

 étoient

0.53

 avoient

0.50

 vérit

0.45

 NDEBUG

0.44

issenschaft

0.44

voorbeeld

0.42

 odeur

0.42

 étoit

0.41

 itſelf

0.41

Activations Density 11.106%

apostrophes and their surrounding words.

apostrophes

No Comments

No Known Activations

apostrophes and their surrounding words.

apostrophes

No Comments

No Known Activations