INDEX

Explanations

normal

np_max-act · gemini-2.0-flash

This neuron detects reassurance language indicating that something is normal or common (e.g., words like “normal,” “common,” and similar).

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

.navigate

-0.07

callback

-0.06

 Арх

-0.06

 분야

-0.06

.loading

-0.06

 injustice

-0.06

 Masters

-0.06

слов

-0.06

 кого

-0.06

 Basically

-0.06

POSITIVE LOGITS

¿

0.08

itized

0.07

/L

0.07

/R

0.06

인지

0.06

 settling

0.06

jezd

0.06

 theoretical

0.06

 Arist

0.06

fd

0.06

Activations Density 0.010%

normal

This neuron detects reassurance language indicating that something is normal or common (e.g., words like “normal,” “common,” and similar).

No Comments

No Known Activations