INDEX

Explanations

suppressing information

np_max-act · gemini-2.0-flash

The neuron fires on words denoting acts of suppression or silencing (e.g., “crush,” “suppress,” “conceal,” “silencing”).

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

_bit

-0.07

效

-0.06

','',

-0.06

Expired

-0.06

以

-0.06

serde

-0.06

 toItem

-0.06

.branch

-0.06

ValuePair

-0.06

 дис

-0.06

POSITIVE LOGITS

fx

0.07

 labels

0.06

лению

0.06

 Veterans

0.06

commerce

0.06

izen

0.06

 Prim

0.06

 dovol

0.06

 assertion

0.06

 refrigerator

0.06

Activations Density 0.020%

suppressing information

The neuron fires on words denoting acts of suppression or silencing (e.g., “crush,” “suppress,” “conceal,” “silencing”).

No Comments

No Known Activations