INDEX

Explanations

references to decision-making and direction

oai_token-act-pair · gpt-4o-mini Triggered by @bot

New Auto-Interp

Top Features by Cosine Similarity

Configuration

ckkissane/attn-saes-gpt2-small-all-layers/gpt2-small_L9_Hcat_z_lr1.20e-03_l11.20e+00_ds24576_bs4096_dc1.00e-06_rsanthropic_rie25000_nr4_v9.pt

Prompts (Dashboard)

36,864 prompts, 128 tokens each

Dataset (Dashboard)

Skylion007/openwebtext

Features

24,576

Data Type

float32

Hook Name

blocks.9.attn.hook_z

Hook Layer

Architecture

standard

Context Size

128

Dataset

Skylion007/openwebtext

Activation Function

relu

Embeds

IFrame

Link

Not in Any Lists

No Comments

Head Attr Weights

0:0.14

1:0.03

2:0.27

3:0.09

4:0.04

5:0.06

6:0.04

7:0.05

8:0.03

9:0.03

10:0.14

11:0.03

Negative Logits

 exclus

-3.17

 unden

-3.00

 remem

-2.96

 uniqueness

-2.65

 exclusion

-2.60

 ineligible

-2.55

 plaque

-2.47

 fingerprint

-2.47

 Anniversary

-2.44

ODUCT

-2.39

POSITIVE LOGITS

 direction

3.78

 Direction

3.70

direction

2.92

 directions

2.76

 directional

2.69

 currents

2.67

 impulses

2.57

renheit

2.56

 tempo

2.54

prop

2.48

Activations Density 0.002%

references to decision-making and direction

No Comments

No Known Activations

references to decision-making and direction

No Comments

No Known Activations