INDEX

Explanations

incorrect, false, wrong

np_max-act · gemini-2.0-flash

The neuron detects tokens that signal a correction or negation of a preceding statement (e.g. words like “incorrect,” “isn’t,” “false,” “wrong,” “quite right,” etc.).

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ंब

-0.06

 outfit

-0.06

Если

-0.06

IntoConstraints

-0.06

Utf

-0.06

امبر

-0.06

ソ

-0.06

ฺ

-0.05

Sup

-0.05

aklı

-0.05

POSITIVE LOGITS

 induces

0.08

 inexperienced

0.07

_qs

0.07

(int

0.07

 Shea

0.07

 aster

0.07

Url

0.07

(CH

0.06

HAL

0.06

 переж

0.06

Activations Density 0.044%

incorrect, false, wrong

The neuron detects tokens that signal a correction or negation of a preceding statement (e.g. words like “incorrect,” “isn’t,” “false,” “wrong,” “quite right,” etc.).

No Comments

No Known Activations