INDEX

Explanations

attends to incorrect response tokens from correct response tokens

oai_attention-head · gpt-4o-mini Triggered by @bot

New Auto-Interp

Configuration

google/gemma-scope-9b-pt-att/layer_0/width_16k/average_l0_61

Prompts (Dashboard)

16,384 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Features

16,384

Data Type

float32

Hook Name

blocks.0.attn.hook_z

Hook Layer

Architecture

jumprelu

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Activation Function

relu

Embeds

PlotsExplanationShow Test FieldDefault Test Text

IFrame

Link

Not in Any Lists

No Comments

Head Attr Weights

0:0.13

1:0.02

2:0.01

3:0.03

4:0.04

5:0.04

6:0.08

7:0.08

8:0.06

9:0.07

10:0.03

11:0.05

12:0.08

13:0.11

14:0.04

15:0.05

Negative Logits

<bos>

-0.20

the

-0.18

 step

-0.15

 surface

-0.14

 side

-0.12

 distance

-0.12

 sides

-0.11

 square

-0.10

 narrow

-0.10

 lower

-0.10

POSITIVE LOGITS

<unused99>

1.12

JAKARTA

1.05

 ſoll

1.00

FIFA

0.98

 ſua

0.97



0.96

 églises

0.96

 ſeinem

0.95

 coachTry

0.95

 royaume

0.95

Activations Density 0.016%

attends to incorrect response tokens from correct response tokens

No Comments

No Known Activations