INDEX

Explanations

first

np_max-act · gemini-2.0-flash

The neuron detects mentions of “first time” (or similar phrasing) that signal a repeated or prior occurrence.

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 nostra

-0.07

 sanctuary

-0.07

excluding

-0.07

혹

-0.07

_DAY

-0.06

rador

-0.06

 Saga

-0.06

논

-0.06

NAS

-0.06

POSITIVE LOGITS

 openid

0.06

 multit

0.06

(=)

0.06

↵

0.06

_almost

0.06

weit

0.06

 concentrates

0.06

)":

0.06

 Stap

0.06

 correlated

0.06

Activations Density 0.009%

first

The neuron detects mentions of “first time” (or similar phrasing) that signal a repeated or prior occurrence.

No Comments

No Known Activations