INDEX

Explanations

assistant

np_max-act · gemini-2.0-flash

mentions of the “Assistant” role label or references to the AI assistant in chat/transcript-style formatting.

oai_token-act-pair · gpt-5 Triggered by @vetterc0

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_7/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.7.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

LG

-0.08

地域

-0.07

 geopolitical

-0.07

gin

-0.06

 Feng

-0.06

 photon

-0.06

KDE

-0.06

 liqu

-0.06

 انقل

-0.06

 energie

-0.06

POSITIVE LOGITS

 Assistant

0.13

 assistant

0.11

Assistant

0.10

ificant

0.08

맞

0.07

 clerk

0.07

アン

0.07

sters

0.07

 suitability

0.07

 assistants

0.06

Activations Density 0.028%

assistant

mentions of the “Assistant” role label or references to the AI assistant in chat/transcript-style formatting.

No Comments

No Known Activations

assistant

mentions of the “Assistant” role label or references to the AI assistant in chat/transcript-style formatting.

No Comments

No Known Activations