INDEX

Explanations

Code/reports/documents

np_max-act · gemini-2.0-flash

This neuron primarily activates on common small “function” words—articles (a, the), auxiliaries/modals (will, can), conjunctions (that), and simple prepositions.

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

getC

-0.06

 Россия

-0.06

/people

-0.06

-img

-0.06

云

-0.06

 Pant

-0.06

GM

-0.06

 Imperial

-0.06

 Snowden

-0.06

books

-0.06

POSITIVE LOGITS

_lost

0.07

 ylabel

0.06

_typeDefinition

0.06

 νεφ

0.06

 disc

0.06

Dip

0.06

loyd

0.06

 Spielberg

0.06

lik

0.06

 replic

0.06

Activations Density 0.278%

Code/reports/documents

This neuron primarily activates on common small “function” words—articles (a, the), auxiliaries/modals (will, can), conjunctions (that), and simple prepositions.

No Comments

No Known Activations