INDEX

Explanations

words related to someone's education and career, especially in the first person

oai_token-act-pair · gemini-2.0-flash

possessive pronouns

np_max-act-logits · gemini-2.0-flash

New Auto-Interp

Configuration

google/gemma-scope-2b-pt-transcoders/layer_25/width_16k/average_l0_41

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Features

16,384

Data Type

float32

Hook Name

blocks.25.ln2.hook_normalized

Architecture

jumprelu_transcoder

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 varandra

-0.88

 européennes

-0.81

 récentes

-0.81

 démocr

-0.81

 fondament

-0.79

 dépens

-0.79

 élevées

-0.77

 colorés

-0.77

 actuelles

-0.75

 chré

-0.75

POSITIVE LOGITS

is

0.86

his

0.84

 their

0.84

 here

0.76

she

0.75

he

0.71

are

0.67

 they

0.65

its

0.65

her

0.60

Activations Density 9.501%

words related to someone's education and career, especially in the first person

possessive pronouns

No Comments

No Known Activations