INDEX

Explanations

Online forum context

np_max-act-logits · gemini-2.0-flash

New Auto-Interp

Configuration

mwhanna/qwen3-4b-transcoders/layer_27.safetensors

Prompts (Dashboard)

16,384 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Features

163,840

Data Type

float32

Hook Name

blocks.27.mlp.hook_in

Architecture

transcoder

Context Size

8,192

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

æ¯Ľ

-0.39

 stomach

-0.33

 hairs

-0.33

æ§½

-0.31

 craw

-0.29

èĦĬ

-0.28

æ¹¿

-0.28

 uneasy

-0.27

 discomfort

-0.26

éĹ·

-0.26

POSITIVE LOGITS

åĲ¸å¼ķ

0.30

åįļ

0.27

hop

0.26

æįŁä¼¤

0.26

é£İéĻ©

0.26

æĸ

0.26

å¸ĮæľĽèĥ½å¤Ł

0.26

ä¼¤å®³

0.25

 Rape

0.24

åĲ¸å¼ķäºĨ

0.24

Activations Density 0.000%

Online forum context

No Comments

No Known Activations