INDEX

Explanations

instructions

np_max-act · gemini-2.0-flash

The neuron fires on the key “object‐and‐parameter” words in step‐by‐step instructions—terms like “module,” “new,” and “name” that label what you’re renaming or configuring.

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 leaned

-0.07

 Puzzle

-0.07

Ped

-0.07

dummy

-0.07

 covid

-0.06

 پاد

-0.06

MUX

-0.06

nin

-0.06

 That

-0.06

Ts

-0.06

POSITIVE LOGITS

 генера

0.07

"/>
↵

0.07

.pick

0.07

ondere

0.07

цієн

0.07

.tom

0.07

 review

0.07

 одну

0.06

菌

0.06

#\

0.06

Activations Density 0.202%

instructions

The neuron fires on the key “object‐and‐parameter” words in step‐by‐step instructions—terms like “module,” “new,” and “name” that label what you’re renaming or configuring.

No Comments

No Known Activations