INDEX

Explanations

Contains "rh" or "h" in words

np_max-act · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-qwen2.5-7b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Monter

-0.08

崁

-0.07

 таким

-0.07

commit

-0.07

здание

-0.07

 pint

-0.07

찐

-0.07

śmie

-0.07

释

-0.07

 setContentView

-0.07

POSITIVE LOGITS

ighb

0.07

經常

0.07

œur

0.07

_Last

0.07

涌现出

0.07

 duplicates

0.06

 guessed

0.06

.about

0.06

Wonder

0.06

脸上

0.06

Activations Density 0.014%

Contains "rh" or "h" in words

No Comments

No Known Activations