INDEX

Explanations

v

np_max-act · gemini-2.0-flash

The neuron fires on isolated subword fragments—especially single‐character or very short tokens (like “I”, “v”, “b”, “ge” or control/metadata markers)—i.e. fragmented pieces rather than whole words.

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

tokens used for document structure, metadata, and speaker/date labels (speaker names, IDs, and numeric/date tokens).

oai_token-act-pair · gpt-5-mini Triggered by @vetterc0

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

reeting

-0.07

mes

-0.06

 семей

-0.06

 grammar

-0.06

 nach

-0.06

 захворювання

-0.06

 atIndex

-0.06

ifications

-0.06

]!='

-0.06

POSITIVE LOGITS

(Utils

0.07

ø

0.06

ायन

0.06

 defin

0.06

 clocks

0.06

[counter

0.06

Eval

0.06

ừng

0.06

िनक

0.06

ılan

0.06

Activations Density 0.232%

v

The neuron fires on isolated subword fragments—especially single‐character or very short tokens (like “I”, “v”, “b”, “ge” or control/metadata markers)—i.e. fragmented pieces rather than whole words.

tokens used for document structure, metadata, and speaker/date labels (speaker names, IDs, and numeric/date tokens).

No Comments

No Known Activations

v

The neuron fires on isolated subword fragments—especially single‐character or very short tokens (like “I”, “v”, “b”, “ge” or control/metadata markers)—i.e. fragmented pieces rather than whole words.

tokens used for document structure, metadata, and speaker/date labels (speaker names, IDs, and numeric/date tokens).

No Comments

No Known Activations