INDEX

Explanations

strange

np_max-act · gemini-2.0-flash

The neuron detects the word “Strange,” especially when it appears as a standalone title or heading.

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

Potential

-0.07

Pi

-0.07

 Exec

-0.07

Lit

-0.07

VS

-0.07

 Vincent

-0.07

Lit

-0.07

 Millennium

-0.07

Wilson

-0.07

It

-0.06

POSITIVE LOGITS

 Strange

0.11

 strange

0.10

 stranger

0.09

 strang

0.08

 strangers

0.08

�

0.08

�

0.08

Strange

0.08

 Stranger

0.07

reste

0.07

Activations Density 0.007%

strange

The neuron detects the word “Strange,” especially when it appears as a standalone title or heading.

No Comments

No Known Activations

strange

The neuron detects the word “Strange,” especially when it appears as a standalone title or heading.

No Comments

No Known Activations