INDEX

Explanations

first-person pronoun

np_max-act · gemini-2.0-flash

This neuron flags tokens that occur in the model’s own (assistant) output text (i.e. it activates on tokens in assistant-generated replies rather than on user text).

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

SG

-0.07

_sync

-0.06

THOOK

-0.06

^^^^

-0.06

تين

-0.06

currentIndex

-0.06

 Bukkit

-0.06

 Пред

-0.06

-slot

-0.06

ska

-0.06

POSITIVE LOGITS

 Catalonia

0.07

 salt

0.06

ーテ

0.06

(types

0.06

DataMember

0.06

 lạnh

0.06

asad

0.06

 discussions

0.06

áb

0.06

aday

0.06

Activations Density 0.015%

first-person pronoun

This neuron flags tokens that occur in the model’s own (assistant) output text (i.e. it activates on tokens in assistant-generated replies rather than on user text).

No Comments

No Known Activations

first-person pronoun

This neuron flags tokens that occur in the model’s own (assistant) output text (i.e. it activates on tokens in assistant-generated replies rather than on user text).

No Comments

No Known Activations