INDEX

Explanations

single

np_max-act · gemini-2.0-flash

This neuron activates on occurrences of the word “single.”

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

Park

-0.07

HOW

-0.07

 theorem

-0.07

 therap

-0.07

How

-0.07

[df

-0.07

how

-0.07

 Alvarez

-0.07

OPP

-0.07

up

-0.07

POSITIVE LOGITS

 single

0.17

single

0.13

 Single

0.12

.Single

0.10

Single

0.10

 SINGLE

0.09

-single

0.09

LE

0.08

ingle

0.08

 singleton

0.08

Activations Density 0.026%

single

This neuron activates on occurrences of the word “single.”

No Comments

No Known Activations