INDEX

Explanations

mentions of humor and jokes, and people taking a role with humorous characteristics.

oai_token-act-pair · gemini-2.0-flash

humor

np_max-act-logits · gemini-2.0-flash

New Auto-Interp

Configuration

google/gemma-scope-2b-pt-transcoders/layer_25/width_16k/average_l0_41

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Features

16,384

Data Type

float32

Hook Name

blocks.25.ln2.hook_normalized

Architecture

jumprelu_transcoder

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 laugh

-2.34

 humor

-2.14

 funny

-2.14

 laughs

-2.13

 laughing

-2.13

 laughed

-2.11

 laughter

-2.06

 humorous

-2.06

 joke

-2.03

 humour

-2.00

POSITIVE LOGITS

Oso

0.47

getDescriptor

0.46

 Arag

0.46

 Supremo

0.43

 impré

0.43

 Hyatt

0.42

 Paramètres

0.42

 Streptococcus

0.41

ichen

0.40

 دیکھیے

0.40

Activations Density 1.124%

mentions of humor and jokes, and people taking a role with humorous characteristics.

humor

No Comments

No Known Activations