INDEX

Explanations

the word "neutral" and words related to it

oai_token-act-pair · gemini-2.0-flash

neutral

np_max-act · gemini-2.0-flash

New Auto-Interp

Configuration

google/gemma-scope-2b-pt-transcoders/layer_0/width_16k/average_l0_76

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Features

16,384

Data Type

float32

Hook Name

blocks.0.ln2.hook_normalized

Architecture

jumprelu_transcoder

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

Efq

-1.16

}")

-1.11

 ſche

-1.01

$_"

-0.98

 betweenstory

-0.97

']))

-0.96

".

-0.96

 Theſe

-0.95

 dieß

-0.94

lapsingToolbar

-0.94

POSITIVE LOGITS

<eos>

0.83

0.73

0.68

0.67

0.66

↵

0.66

</td>

0.63

0.59

Activations Density 2.592%

the word "neutral" and words related to it

neutral

No Comments

No Known Activations