INDEX

Explanations

Actions

np_max-act · gemini-2.0-flash

The neuron fires on verbs and verb forms that issue user-actions or commands (e.g. block, ban, use, disable, limit) indicating instructions or directives.

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Kate

-0.06

	board

-0.06

 Canon

-0.06

 Laure

-0.06

 Juice

-0.06

raphics

-0.06

γ

-0.06

izons

-0.06

_restrict

-0.06

flux

-0.06

POSITIVE LOGITS

 were

0.07

 bell

0.07

 experimenting

0.06

CppMethodIntialized

0.06

	echo

0.06

 LAST

0.06

、い

0.06

andles

0.06

_HIDDEN

0.06

ButtonItem

0.06

Activations Density 0.065%

Actions

The neuron fires on verbs and verb forms that issue user-actions or commands (e.g. block, ban, use, disable, limit) indicating instructions or directives.

No Comments

No Known Activations

Actions

The neuron fires on verbs and verb forms that issue user-actions or commands (e.g. block, ban, use, disable, limit) indicating instructions or directives.

No Comments

No Known Activations