INDEX

Explanations

"

np_max-act · gemini-2.0-flash

This neuron spots the closing quote‐and‐bracket sequence (“]”) that marks the end of the user’s placeholder for “your answer” in toxic‐speech instructions.

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 програми

-0.06

 Musical

-0.06

Informe

-0.06

_gift

-0.06

 десят

-0.06

 upbeat

-0.06

another

-0.06

atar

-0.06

 علت

-0.06

ường

-0.06

POSITIVE LOGITS

ольз

0.08

.toJson

0.06

]]

0.06

 lose

0.06

千

0.06

okes

0.06

mA

0.06

.Left

0.06

 FOREIGN

0.06

.left

0.06

Activations Density 0.000%

"

This neuron spots the closing quote‐and‐bracket sequence (“]”) that marks the end of the user’s placeholder for “your answer” in toxic‐speech instructions.

No Comments

No Known Activations