INDEX

Explanations

I

np_max-act · gemini-2.0-flash

Sentences that begin an answer post written in the first person (e.g., the start of an "A:" response where the author says "I" or "I think").

oai_token-act-pair · gpt-5-mini Triggered by @vetterc0

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_7/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.7.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ela

-0.07

给我

-0.07

ternal

-0.07

erialization

-0.06

-aut

-0.06

_vector

-0.06

emale

-0.06

Language

-0.06

.prop

-0.06

SHORT

-0.06

POSITIVE LOGITS

FAQ

0.06

 Indones

0.06

:http

0.06

:^(

0.06

시

0.06

 örgüt

0.06

 полос

0.06

 extends

0.06

 REPL

0.06

 Güven

0.06

Activations Density 0.034%

I

Sentences that begin an answer post written in the first person (e.g., the start of an "A:" response where the author says "I" or "I think").

No Comments

No Known Activations

I

Sentences that begin an answer post written in the first person (e.g., the start of an "A:" response where the author says "I" or "I think").

No Comments

No Known Activations