INDEX

Explanations

AI

np_max-act · gemini-2.0-flash

instances of the first-person pronoun "I" (self-references by the assistant).

oai_token-act-pair · gpt-5-mini Triggered by @yooniel31

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_7/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.7.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 cyclists

-0.07

_co

-0.07

_col

-0.07

-ob

-0.06

_input

-0.06

(callback

-0.06

 increment

-0.06

unto

-0.06

.runners

-0.06

POSITIVE LOGITS

Tue

0.06

ekkür

0.06

 milfs

0.06

 STACK

0.06

Tại

0.06

 saturated

0.06

KeyType

0.06

Infinity

0.06

 bolster

0.06

suk

0.05

Activations Density 0.108%

AI

instances of the first-person pronoun "I" (self-references by the assistant).

No Comments

No Known Activations