INDEX

Explanations

gradient

np_max-act · gemini-2.0-flash

This neuron fires on occurrences of “gradient” (especially in “gradient descent”), i.e. it recognizes gradient-related terminology.

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 mobs

-0.08

voucher

-0.07

ubuntu

-0.07

umor

-0.06

-abs

-0.06

告

-0.06

richText

-0.06

ulators

-0.06

POSITIVE LOGITS

 innovative

0.07

(Sprite

0.07

.HORIZONTAL

0.07

�

0.07

 estamos

0.06

 dejtings

0.06

 habil

0.06

Coordinates

0.06

rou

0.06

 ослож

0.06

Activations Density 0.002%

gradient

This neuron fires on occurrences of “gradient” (especially in “gradient descent”), i.e. it recognizes gradient-related terminology.

No Comments

No Known Activations