INDEX

Explanations

references to copyright and licensing information

oai_token-act-pair · gpt-4o-mini Triggered by @bot

blog post metadata

np_max-act-logits · gpt-5 Triggered by @arielgoldsteinlab

boilerplate metadata and headers, including post timestamps and filing info, RSS/feed notices, comment/ping status, and copyright/license notices.

oai_token-act-pair · gpt-5 Triggered by @arielgoldsteinlab

copyright notices and publication metadata in web content.

oai_token-act-pair · deepseek-r1 Triggered by @arielgoldsteinlab

New Auto-Interp

Top Features by Cosine Similarity

Comparing With GEMMA-2-9B @ 21-gemmascope-res-16k

Configuration

google/gemma-scope-9b-pt-res/layer_21/width_16k/average_l0_129

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Features

16,384

Data Type

float32

Hook Name

blocks.21.hook_resid_post

Hook Layer

Architecture

jumprelu

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Activation Function

relu

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

exprimer

-0.36

 cooperación

-0.36

 Behör

-0.36

 reproducción

-0.34

 związane

-0.34

 universel

-0.34

 estação

-0.33

 kemenangan

-0.33

 realização

-0.33

 bersifat

-0.32

POSITIVE LOGITS

 nakalista

1.05

 utafitiHapana

1.00

 Numerade

0.98

TagMode

0.88

 nahilalakip

0.84

Datuak

0.83

Autoritní

0.82

ScopeManager

0.81

ValueStyle

0.79

 disambiguazione

0.78

Activations Density 0.445%

references to copyright and licensing information

blog post metadata

boilerplate metadata and headers, including post timestamps and filing info, RSS/feed notices, comment/ping status, and copyright/license notices.

copyright notices and publication metadata in web content.

No Comments

No Known Activations