INDEX

Explanations

least

np_max-act · gemini-2.0-flash

The neuron activates on normative requirement language—phrases stating what “must” or “should” be done or “at least” needs to be present.

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Best

-0.07

 employees

-0.07

 आई

-0.07

User

-0.07

 внимание

-0.06

-plane

-0.06

 owning

-0.06

Hide

-0.06

 Employees

-0.06

Pi

-0.06

POSITIVE LOGITS

чи

0.07

ING

0.07

 disparate

0.07

инг

0.07

 extravag

0.07

 صند

0.06

rico

0.06

cer

0.06

 territorial

0.06

제

0.06

Activations Density 0.016%

least

The neuron activates on normative requirement language—phrases stating what “must” or “should” be done or “at least” needs to be present.

No Comments

No Known Activations