INDEX

Explanations

Pressure, science

np_max-act · gemini-2.0-flash

This neuron detects the assistant’s speaker‐label marker (the “<|start_header_id|>” token indicating an assistant response).

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

oked

-0.07

_TAB

-0.07

_pose

-0.07

persona

-0.07

 setters

-0.06

 negotiate

-0.06

utsche

-0.06

hash

-0.06

 acad

-0.06

 Plum

-0.06

POSITIVE LOGITS

 prostřednictvím

0.07

므로

0.07

 enlarge

0.06

 Circular

0.06

.Alert

0.06

Thank

0.06

 Factors

0.06

.innerHTML

0.06

 použití

0.06

 sposób

0.06

Activations Density 0.032%

Pressure, science

This neuron detects the assistant’s speaker‐label marker (the “<|start_header_id|>” token indicating an assistant response).

No Comments

No Known Activations