INDEX

Explanations

from

np_max-act · gemini-2.0-flash

technical or domain-specific terminology (jargon) — i.e., tokens from technical descriptions, code, networking, or scientific/patent language.

oai_token-act-pair · gpt-5-mini Triggered by @yooniel31

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_15/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.15.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

_funcs

-0.07

baum

-0.06

 Academ

-0.06

Hours

-0.06

 spoiled

-0.06

elves

-0.06

-life

-0.06

 photographers

-0.06

 Seconds

-0.06

Sampler

-0.06

POSITIVE LOGITS

غ

0.06

献

0.06

LEG

0.06

OLS

0.06

Embed

0.06

_AMD

0.06

tgt

0.06

 TABLE

0.06

'].

0.06

 ناح

0.06

Activations Density 0.341%

from

technical or domain-specific terminology (jargon) — i.e., tokens from technical descriptions, code, networking, or scientific/patent language.

No Comments

No Known Activations