INDEX

Explanations

the word "behind" and, to a lesser extent, words related to publishing material

oai_token-act-pair · gemini-2.0-flash

behind

np_max-act · gemini-2.0-flash

New Auto-Interp

Configuration

google/gemma-scope-2b-pt-transcoders/layer_4/width_16k/average_l0_88

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Features

16,384

Data Type

float32

Hook Name

blocks.4.ln2.hook_normalized

Architecture

jumprelu_transcoder

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 behind

-2.50

behind

-2.27

Behind

-2.11

 Behind

-2.06

 BEHIND

-2.03

 derrière

-1.94

 detrás

-1.75

 dietro

-1.72

<bos>

-1.46

 bakom

-1.45

POSITIVE LOGITS

}`}

0.53

 Starting

0.51

 omge

0.47

 zase

0.47

 labdar

0.47

 voet

0.46

vyk

0.44

ışık

0.44

dom

0.43

 Altman

0.43

Activations Density 2.545%

the word "behind" and, to a lesser extent, words related to publishing material

behind

No Comments

No Known Activations