INDEX

Explanations

self-care

np_max-act · gemini-2.0-flash

The neuron selectively activates on self-referential and second-person advice terms—especially words like “you,” “your,” “self,” and “self-care.”

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

tah

-0.07

anc

-0.07

 Coch

-0.06

 план

-0.06

浜

-0.06

 tant

-0.06

coupon

-0.06

äter

-0.06

 هذه

-0.06

_.

-0.06

POSITIVE LOGITS

 skyline

0.06

 Venture

0.06

 колич

0.06

 Brave

0.06

toDouble

0.06

 hides

0.06

 Primary

0.06

(KeyCode

0.06

Kai

0.06

.isSelected

0.06

Activations Density 0.033%

self-care

The neuron selectively activates on self-referential and second-person advice terms—especially words like “you,” “your,” “self,” and “self-care.”

No Comments

No Known Activations

self-care

The neuron selectively activates on self-referential and second-person advice terms—especially words like “you,” “your,” “self,” and “self-care.”

No Comments

No Known Activations