INDEX

Explanations

promotional language

np_max-act · gemini-2.0-flash

the neuron detects promotional or sales-style hype language—words used to tout or market something as exciting, exclusive, or superior.

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

办公

-0.07

 provision

-0.07

 trend

-0.07

username

-0.07

 owners

-0.07

QA

-0.07

 spéc

-0.06

 rifles

-0.06

pink

-0.06

pä

-0.06

POSITIVE LOGITS

MASConstraintMaker

0.08

----------↵

0.07

">&#

0.07

 hấp

0.06

xcb

0.06

(('

0.06

')));↵

0.06

ostringstream

0.06

_First

0.06

_GO

0.06

Activations Density 0.103%

promotional language

the neuron detects promotional or sales-style hype language—words used to tout or market something as exciting, exclusive, or superior.

No Comments

No Known Activations

promotional language

the neuron detects promotional or sales-style hype language—words used to tout or market something as exciting, exclusive, or superior.

No Comments

No Known Activations