INDEX

Explanations

phrases enclosed in quotation marks, and also some punctuation near words indicating aggression

oai_token-act-pair · gemini-2.0-flash

Quotation marks

np_max-act-logits · gemini-2.0-flash

New Auto-Interp

Configuration

google/gemma-scope-2b-pt-transcoders/layer_19/width_16k/average_l0_12

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Features

16,384

Data Type

float32

Hook Name

blocks.19.ln2.hook_normalized

Architecture

jumprelu_transcoder

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 themſelves

-1.12

 himſelf

-1.11

 myſelf

-1.05

 purpoſe

-1.05

 houſe

-1.04

 pleaſure

-1.04

 itſelf

-1.03

Efq

-1.00

 AssemblyCulture

-0.96

 Houſe

-0.95

POSITIVE LOGITS

0.65

0.60

Be

0.58

0.55

Don

0.55

0.54

me

0.53

0.52

Ben

0.51

Per

0.49

Activations Density 0.775%

phrases enclosed in quotation marks, and also some punctuation near words indicating aggression

Quotation marks

No Comments

No Known Activations