INDEX

Explanations

phrases that express requests for feedback or assistance

oai_token-act-pair · gpt-4o-mini Triggered by @bot

New Auto-Interp

Top Features by Cosine Similarity

Comparing With GEMMA-2-2B @ 3-gemmascope-res-16k

Configuration

google/gemma-scope-2b-pt-res/layer_3/width_16k/average_l0_59

Prompts (Dashboard)

36,864 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Features

16,384

Data Type

float32

Hook Name

blocks.3.hook_resid_post

Hook Layer

Architecture

jumprelu

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Activation Function

relu

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ENEFITS

-0.51

ņas

-0.49

стоин

-0.49

ņa

-0.49

 häls

-0.48

neſs

-0.48

XVI

-0.48

يلات

-0.47

junto

-0.46

USET

-0.46

POSITIVE LOGITS

GEBURTSDATUM

0.91

Diweddarwch

0.85

 betweenstory

0.80

 Gives

0.70

bewerken

0.68

Gives

0.66

 चीज़ों

0.62

Personensuche

0.62

 Giving

0.61

clusal

0.60

Activations Density 0.110%

phrases that express requests for feedback or assistance

No Comments

No Known Activations

phrases that express requests for feedback or assistance

No Comments

No Known Activations