INDEX

Explanations

causing or preventing

np_max-act · gemini-2.0-flash

descriptions of inventions and technical innovations related to locking mechanisms.

oai_token-act-pair · gpt-4o-mini Triggered by @xinyanhu8

The neuron activates on verbs that signal safety or hazard actions—especially words like “prevent,” “secure,” or “cause.”

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

-0.06

 dür

-0.06

agger

-0.06

 apoptosis

-0.06

esktop

-0.06

audi

-0.06

订

-0.06

انس

-0.06

ST

-0.06

POSITIVE LOGITS

 sexist

0.07

 Lindsay

0.07

%");↵

0.06

жень

0.06

 обрат

0.06

-match

0.06

:^{↵

0.06

 생성

0.06

 директор

0.06

 ціл

0.06

Activations Density 0.179%

causing or preventing

descriptions of inventions and technical innovations related to locking mechanisms.

The neuron activates on verbs that signal safety or hazard actions—especially words like “prevent,” “secure,” or “cause.”

No Comments

No Known Activations

causing or preventing

descriptions of inventions and technical innovations related to locking mechanisms.

The neuron activates on verbs that signal safety or hazard actions—especially words like “prevent,” “secure,” or “cause.”

No Comments

No Known Activations