INDEX

Explanations

pronouns referring to people

np_max-act · gemini-2.0-flash

The neuron detects first-person self-references—tokens that signal the author or speaker talking about themselves or their personal experiences.

oai_token-act-pair · gpt-5-mini Triggered by @vetterc0

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

attacks

-0.08

Mc

-0.07

 मल

-0.07

 automate

-0.07

 provisioning

-0.07

 migr

-0.06

项目

-0.06

ilir

-0.06

 mutable

-0.06

 corners

-0.06

POSITIVE LOGITS

итай

0.08

essaging

0.07

+$

0.07

CLUS

0.06

BOOT

0.06

避

0.06

'][$

0.06

UNC

0.06

EDITOR

0.06

Β

0.06

Activations Density 0.161%

pronouns referring to people

The neuron detects first-person self-references—tokens that signal the author or speaker talking about themselves or their personal experiences.

No Comments

No Known Activations

pronouns referring to people

The neuron detects first-person self-references—tokens that signal the author or speaker talking about themselves or their personal experiences.

No Comments

No Known Activations