INDEX

Explanations

clearly

np_max-act · gemini-2.0-flash

The neuron fires on emphatic commentary tokens—especially the word “clearly” (and its immediate assertion context) that marks a writer’s emphatic stance.

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

-0.07

たら

-0.06

 prefs

-0.06

cs

-0.06

 unspecified

-0.06

INSTANCE

-0.06

рол

-0.06

agnostic

-0.06

 Package

-0.06

 أغسطس

-0.06

POSITIVE LOGITS

 istiyor

0.09

lara

0.07

Clearly

0.06

unya

0.06

_radio

0.06

 ALLOW

0.06

 киш

0.06

=tmp

0.06

 Clearly

0.06

ZONE

0.06

Activations Density 0.034%

clearly

The neuron fires on emphatic commentary tokens—especially the word “clearly” (and its immediate assertion context) that marks a writer’s emphatic stance.

No Comments

No Known Activations

clearly

The neuron fires on emphatic commentary tokens—especially the word “clearly” (and its immediate assertion context) that marks a writer’s emphatic stance.

No Comments

No Known Activations