INDEX

Explanations

intelligence and reason

np_max-act · gemini-2.0-flash

The neuron fires strongest on words referring to intellectual or reasoning qualities—terms like “intelligence,” “reason,” “rational,” “designed,” and similar cognitive-type descriptors.

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

Ign

-0.07

She

-0.07

_H

-0.07

Muslim

-0.07

.Serialization

-0.07

탁

-0.07

avage

-0.06

Tim

-0.06

 pains

-0.06

hpp

-0.06

POSITIVE LOGITS

 UserRepository

0.06

 Relatives

0.06

 Vari

0.06

ные

0.06

 cmap

0.06

.fhir

0.06

abol

0.06

 Option

0.06

 Raqqa

0.06

/full

0.06

Activations Density 0.082%

intelligence and reason

The neuron fires strongest on words referring to intellectual or reasoning qualities—terms like “intelligence,” “reason,” “rational,” “designed,” and similar cognitive-type descriptors.

No Comments

No Known Activations

intelligence and reason

The neuron fires strongest on words referring to intellectual or reasoning qualities—terms like “intelligence,” “reason,” “rational,” “designed,” and similar cognitive-type descriptors.

No Comments

No Known Activations