INDEX

Explanations

words associated with conflict, visibility, and negative emotions

oai_token-act-pair · gemini-2.0-flash

revealing secrets/feelings

np_max-act · gemini-2.0-flash

New Auto-Interp

Configuration

google/gemma-scope-2b-pt-transcoders/layer_4/width_16k/average_l0_88

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Features

16,384

Data Type

float32

Hook Name

blocks.4.ln2.hook_normalized

Architecture

jumprelu_transcoder

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

})->

-0.62

ConfigureAwait

-0.62

,:);

-0.61

"}}

-0.61

})();

-0.59

 Verd

-0.59

-0.58

();}

-0.57

 Kars

-0.56

❖

-0.56

POSITIVE LOGITS

ValueStyle

0.56

addCriterion

0.52

дото

0.51

ugin

0.44

 possibilité

0.43

oreille

0.43

 Initialized

0.43

...

0.43

refundable

0.42

httphttps

0.42

Activations Density 1.638%

words associated with conflict, visibility, and negative emotions

revealing secrets/feelings

No Comments

No Known Activations