INDEX

Explanations

period

np_max-act · gemini-2.0-flash

The neuron activates on polite expressions of gratitude—e.g. “thank you,” “thanks,” and related thank-you phrases.

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 FALSE

-0.06

 pork

-0.06

 titles

-0.06

 Tate

-0.06

ampton

-0.06

 offices

-0.06

 helper

-0.06

 bank

-0.06

azar

-0.06

 CASE

-0.06

POSITIVE LOGITS

 underst

0.06

 йому

0.06

 karak

0.06

-HT

0.06

 clap

0.06

Mid

0.06

.'.

0.06

Ost

0.06

 foreseeable

0.06

 Reload

0.06

Activations Density 0.075%

period

The neuron activates on polite expressions of gratitude—e.g. “thank you,” “thanks,” and related thank-you phrases.

No Comments

No Known Activations

period

The neuron activates on polite expressions of gratitude—e.g. “thank you,” “thanks,” and related thank-you phrases.

No Comments

No Known Activations