INDEX

Explanations

writing and fun

np_max-act · gemini-2.0-flash

This neuron detects adjectives and adverbs that specify the assistant’s desired tone or style (e.g., “funny,” “edgy,” “like”).

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 copied

-0.06

xfe

-0.06

 Cleans

-0.06

_material

-0.06

opic

-0.05

quirer

-0.05

 obten

-0.05

 Гол

-0.05

 Kiş

-0.05

enville

-0.05

POSITIVE LOGITS

sea

0.08

 những

0.07

 hedge

0.07

是一个

0.07

.states

0.06

 Điều

0.06

 enim

0.06

*'

0.06

 زمینه

0.06

July

0.06

Activations Density 0.003%

writing and fun

This neuron detects adjectives and adverbs that specify the assistant’s desired tone or style (e.g., “funny,” “edgy,” “like”).

No Comments

No Known Activations