INDEX

Explanations

words or phrases denoting a feeling, person, or object toward which a deep affection or intense loyalty exists

oai_token-act-pair · gemini-2.0-flash

body parts

np_max-act-logits · gemini-2.0-flash

New Auto-Interp

Configuration

google/gemma-scope-2b-pt-transcoders/layer_24/width_16k/average_l0_37

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Features

16,384

Data Type

float32

Hook Name

blocks.24.ln2.hook_normalized

Architecture

jumprelu_transcoder

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 lives

-1.16

 heads

-1.16

 minds

-1.10

 hands

-1.08

 Heads

-1.05

 Hands

-1.04

hands

-1.02

Heads

-1.02

Lives

-1.01

Hands

-0.98

POSITIVE LOGITS

 hand

1.15

 head

1.09

eye

1.06

 heart

1.05

 mind

0.99

 foot

0.93

ear

0.92

arm

0.90

 soul

0.85

 brain

0.83

Activations Density 5.008%

words or phrases denoting a feeling, person, or object toward which a deep affection or intense loyalty exists

body parts

No Comments

No Known Activations

words or phrases denoting a feeling, person, or object toward which a deep affection or intense loyalty exists

body parts

No Comments

No Known Activations