INDEX

Explanations

Descriptive

np_max-act · gemini-2.0-flash

This neuron activates on descriptive adjectives and adverbs (i.e. qualifiers of degree or traits).

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 halluc

-0.07

 blow

-0.07

utex

-0.06

包括

-0.06

tır

-0.06

_sec

-0.06

�认

-0.06

 reply

-0.06

 Dirk

-0.06

 Blizzard

-0.06

POSITIVE LOGITS

 البحر

0.06

dirname

0.06

 arts

0.06

 thân

0.06

 traveler

0.06

.mar

0.06

Pow

0.06

 straw

0.06

무

0.05

 Angiosper

0.05

Activations Density 0.064%

Descriptive

This neuron activates on descriptive adjectives and adverbs (i.e. qualifiers of degree or traits).

No Comments

No Known Activations