INDEX

Explanations

lama

np_max-act · gemini-2.0-flash

It detects mentions of the Llama language model name (and its letter-case/variant tokenizations).

oai_token-act-pair · gpt-5-mini Triggered by @yooniel31

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_7/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.7.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

_ud

-0.08

 forge

-0.08

 deposits

-0.07

-reviewed

-0.07

 survives

-0.07

ategorized

-0.06

顾

-0.06

 received

-0.06

 поверх

-0.06

 marginalized

-0.06

POSITIVE LOGITS

mare

0.08

안마

0.07

 кожного

0.06

čet

0.06

_COMM

0.06

χώ

0.06

_factory

0.06

 arithmetic

0.06

name

0.06

 Enums

0.06

Activations Density 0.001%

lama

It detects mentions of the Llama language model name (and its letter-case/variant tokenizations).

No Comments

No Known Activations