INDEX

Explanations

Citations

np_max-act · gemini-2.0-flash

The neuron selectively activates on in‐text citation markers and reference labels (e.g. bracketed “[@HS…]” tokens and author‐initial tags).

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 comm

-0.07

_inf

-0.07

 Sawyer

-0.06

 insured

-0.06

 donation

-0.06

rust

-0.06

 affidavit

-0.06

Replacing

-0.06

 Raid

-0.06

vos

-0.06

POSITIVE LOGITS

 doğrult

0.07

허

0.06

 Profes

0.06

dem

0.06

 здійс

0.06

 Değ

0.06

 Vous

0.06

 بم

0.06

าษฎ

0.06

 Friendship

0.06

Activations Density 0.014%

Citations

The neuron selectively activates on in‐text citation markers and reference labels (e.g. bracketed “[@HS…]” tokens and author‐initial tags).

No Comments

No Known Activations

Citations

The neuron selectively activates on in‐text citation markers and reference labels (e.g. bracketed “[@HS…]” tokens and author‐initial tags).

No Comments

No Known Activations