INDEX

Explanations

expressions indicating strong emotional responses or impactful moments

oai_token-act-pair · gpt-4o-mini Triggered by @bot

New Auto-Interp

Top Features by Cosine Similarity

Configuration

ckkissane/attn-saes-gpt2-small-all-layers/gpt2-small_L9_Hcat_z_lr1.20e-03_l11.20e+00_ds24576_bs4096_dc1.00e-06_rsanthropic_rie25000_nr4_v9.pt

Prompts (Dashboard)

36,864 prompts, 128 tokens each

Dataset (Dashboard)

Skylion007/openwebtext

Features

24,576

Data Type

float32

Hook Name

blocks.9.attn.hook_z

Hook Layer

Architecture

standard

Context Size

128

Dataset

Skylion007/openwebtext

Activation Function

relu

Embeds

IFrame

Link

Not in Any Lists

No Comments

Head Attr Weights

0:0.06

1:0.04

2:0.09

3:0.04

4:0.06

5:0.05

6:0.24

7:0.05

8:0.06

9:0.18

10:0.03

11:0.04

Negative Logits

 Blackburn

-3.95

wild

-3.94

aido

-3.80

Mask

-3.66

patch

-3.65

ayne

-3.64

Wild

-3.61

Naz

-3.53

 Marks

-3.51

Ez

-3.50

POSITIVE LOGITS

 Gravity

8.75

 gravity

7.85

gravity

6.75

ravity

6.01

 gravitational

5.91

 Grav

5.81

 grav

5.28

 helium

4.43

 Galileo

4.42

 centrif

4.37

Activations Density 0.003%

expressions indicating strong emotional responses or impactful moments

No Comments

No Known Activations