INDEX

Explanations

instructions and technical writing

np_max-act · gemini-2.0-flash

logical inconsistencies in claims about the presence of wildfire smoke or flames in images.

oai_token-act-pair · gpt-4o-mini Triggered by @xinyanhu8

This neuron activates on text where the model refers to its own reasoning—especially phrases like “your thought process.”

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

enf

-0.08

 menuItem

-0.07

Hold

-0.06

 signUp

-0.06

 vazgeç

-0.06

 promotes

-0.06

 meat

-0.06

 Meat

-0.06

],
↵

-0.06

Ingredients

-0.06

POSITIVE LOGITS

(currency

0.07

operations

0.07

且

0.07

 poco

0.07

inars

0.07

 difficulty

0.06

locator

0.06

 Voyage

0.06

гляд

0.06

/components

0.06

Activations Density 0.006%

instructions and technical writing

logical inconsistencies in claims about the presence of wildfire smoke or flames in images.

This neuron activates on text where the model refers to its own reasoning—especially phrases like “your thought process.”

No Comments

No Known Activations