INDEX

Explanations

foreign words or code

np_max-act-logits · gemini-2.0-flash

prescriptive medical or health advice and instructions on what someone should do or take.

oai_token-act-pair · claude-4-5-haiku Triggered by @xiaoqingsun004

New Auto-Interp

Configuration

andyrdt/saes-qwen2.5-7b-instruct/resid_post_layer_19/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.19.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 كرة

-0.07

俳

-0.07

.arg

-0.07

 Caroline

-0.07

乩

-0.07

 işçi

-0.07

犰

-0.07

��

-0.06

afone

-0.06

imum

-0.06

POSITIVE LOGITS

不含

0.08

_RATE

0.08

/logger

0.07

_fail

0.07

有关部门

0.07

阿姨

0.07

.configuration

0.07

_stub

0.07

 проблемы

0.07

ureen

0.07

Activations Density 0.007%

foreign words or code

prescriptive medical or health advice and instructions on what someone should do or take.

No Comments

No Known Activations