INDEX

Explanations

Errors

np_max-act · gemini-2.0-flash

The neuron detects the occurrence of the word “error” (notably as part of the assistant’s “If you believe this is an error…” feedback request).

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

影響

-0.07

_none

-0.06

abbrev

-0.06

 persists

-0.06

CENTER

-0.06

 schizophren

-0.06

 mutex

-0.06

post

-0.06

("/");↵

-0.06

 marking

-0.06

POSITIVE LOGITS

 stretched

0.07

_mini

0.07

(KP

0.07

 그가

0.07

 dolor

0.06

$core

0.06

(today

0.06

/calendar

0.06

 DOWNLOAD

0.06

총

0.06

Activations Density 0.002%

Errors

The neuron detects the occurrence of the word “error” (notably as part of the assistant’s “If you believe this is an error…” feedback request).

No Comments

No Known Activations