INDEX

Explanations

according to

np_max-act · gemini-2.0-flash

The neuron fires on phrases attributing information to a source—especially “According to X”‐style citations.

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

！”

-0.07

uncio

-0.07

ingga

-0.07

 AttributeSet

-0.07

$v

-0.07

IsRequired

-0.06

—if

-0.06

.Mesh

-0.06

افية

-0.06

assessment

-0.06

POSITIVE LOGITS

 toho

0.06

(secret

0.06

stacle

0.06

sb

0.06

อด

0.06

 discontinued

0.06

λης

0.06

ps

0.06

(angle

0.06

 Eine

0.06

Activations Density 0.047%

according to

The neuron fires on phrases attributing information to a source—especially “According to X”‐style citations.

No Comments

No Known Activations