INDEX

Explanations

obstacles and hindrance

np_max-act · gemini-2.0-flash

The neuron selectively activates on sub-word pieces of gerunds and present‐participle verbs (i.e. “-ing” forms).

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

_identity

-0.07

wrong

-0.06

GetProperty

-0.06

 Maths

-0.06

 copyrighted

-0.06

_credentials

-0.06

_pcm

-0.06

 conserve

-0.05

_pf

-0.05

_resp

-0.05

POSITIVE LOGITS

 barriers

0.10

 imped

0.09

 obstacles

0.09

 hinder

0.08

 obstacle

0.08

details

0.08

 prohibiting

0.07

 Hind

0.07

 suppress

0.07

 muddy

0.07

Activations Density 0.024%

obstacles and hindrance

The neuron selectively activates on sub-word pieces of gerunds and present‐participle verbs (i.e. “-ing” forms).

No Comments

No Known Activations