INDEX

Explanations

resist/deny

np_max-act · gemini-2.0-flash

New Auto-Interp

Configuration

mwhanna/qwen3-4b-transcoders/layer_7.safetensors

Prompts (Dashboard)

16,384 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Features

163,840

Data Type

float32

Hook Name

blocks.7.mlp.hook_in

Architecture

transcoder

Context Size

8,192

Dataset

monology/pile-uncopyrighted

Embeds

PlotsExplanationShow Test FieldDefault Test Text

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

çĻ»

-0.26

 Mister

-0.26

 experiment

-0.25

ç»ĥ

-0.25

 Pose

-0.25

å®ŀéªĮ

-0.25

abelle

-0.24

 Everything

-0.24

-env

-0.23

experiment

-0.23

POSITIVE LOGITS

_BINDING

0.26

éĥ½ä¸įæĺ¯

0.26

arah

0.25

çĺ¦

0.25

åĨįåĪ°

0.24

åİ»è¿ĩ

0.24

+i

0.24

éĥ½ä¸į

0.24

èĻ±

0.24

witch

0.23

Activations Density 0.001%

resist/deny

No Comments

No Known Activations