INDEX

Explanations

:

np_max-act · gemini-2.0-flash

explanations and descriptions of threats or predatory creatures.

oai_token-act-pair · gpt-4o-mini Triggered by @xinyanhu8

The neuron is primarily detecting the “<|end_header_id|>” token that marks the boundary between the speaker header and the start of the assistant’s response.

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

tokens that mark message/role headers or other conversational metadata (e.g., header boundary markers and the word "assistant").

oai_token-act-pair · gpt-5-mini Triggered by @vetterc0

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ior

-0.09

速

-0.07

urrenc

-0.06

YÖ

-0.06

VT

-0.06

iological

-0.06

 Bourbon

-0.06

 Billy

-0.06

 antenna

-0.06

POSITIVE LOGITS

 klar

0.07

 /*!↵

0.06

字符串

0.06

 Clash

0.06

,\↵

0.06

.Application

0.06

 choix

0.06

begin

0.06

 Tang

0.06

Activations Density 0.073%

:

explanations and descriptions of threats or predatory creatures.

The neuron is primarily detecting the “<|end_header_id|>” token that marks the boundary between the speaker header and the start of the assistant’s response.

tokens that mark message/role headers or other conversational metadata (e.g., header boundary markers and the word "assistant").

No Comments

No Known Activations

:

explanations and descriptions of threats or predatory creatures.

The neuron is primarily detecting the “<|end_header_id|>” token that marks the boundary between the speaker header and the start of the assistant’s response.

tokens that mark message/role headers or other conversational metadata (e.g., header boundary markers and the word "assistant").

No Comments

No Known Activations