INDEX

Explanations

reporting/explanations

np_max-act · gemini-2.0-flash

The neuron is sensitive to negation and contradiction cues (e.g. “No,” “not,” “while,” “does not”), flagging factual inconsistencies.

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 courseId

-0.07

Rh

-0.06

 UPPER

-0.06

 caught

-0.06

 refugees

-0.06

sø

-0.06

(filepath

-0.06

 hips

-0.06

 tragic

-0.06

Texto

-0.06

POSITIVE LOGITS

.logo

0.08

ocrine

0.07

_company

0.07

 этого

0.06

ault

0.06

 Bottle

0.06

.usage

0.06

шей

0.06

tement

0.06

χει

0.06

Activations Density 0.013%

reporting/explanations

The neuron is sensitive to negation and contradiction cues (e.g. “No,” “not,” “while,” “does not”), flagging factual inconsistencies.

No Comments

No Known Activations

reporting/explanations

The neuron is sensitive to negation and contradiction cues (e.g. “No,” “not,” “while,” “does not”), flagging factual inconsistencies.

No Comments

No Known Activations