INDEX

Explanations

-Americans

np_max-act · gemini-2.0-flash

The neuron is picking out numerical values and statistical measures (digits, decimal numbers, and similar quantitative tokens).

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

unsafe

-0.06

 engr

-0.06

 pairing

-0.06

.MiddleCenter

-0.06

 gatherings

-0.06

SOC

-0.06

 improvis

-0.06

 jíd

-0.06

 spreadsheet

-0.06

 glaciers

-0.06

POSITIVE LOGITS

uesta

0.07

叔

0.06

 Гол

0.06

šem

0.06

 choix

0.06

.
↵

0.06

بي

0.06

(square

0.06

","

0.06

[target

0.06

Activations Density 0.072%

-Americans

The neuron is picking out numerical values and statistical measures (digits, decimal numbers, and similar quantitative tokens).

No Comments

No Known Activations

-Americans

The neuron is picking out numerical values and statistical measures (digits, decimal numbers, and similar quantitative tokens).

No Comments

No Known Activations