INDEX

Explanations

what starts questions

np_acts-logits-general · gemini-2.5-flash-lite

what

np_max-act-logits · claude-4-5-sonnet Triggered by @sk5695

question words

np_max-act · claude-4-5-sonnet Triggered by @sk5695

New Auto-Interp

Configuration

google/gemma-scope-2-27b-pt/resid_post/layer_31_width_16k_l0_medium

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 estejam

0.25

 zowel

0.24

 mesmas

0.24

 stets

0.24

 있지만

0.23

 jedynie

0.23

 देखील

0.23

雖

0.23

 虽然

0.23

 zwar

0.22

POSITIVE LOGITS

0.37

？

0.33

难道

0.33

はどう

0.32

?";

0.32

?“

0.32

 réellement

0.32

…?

0.32

What

0.31

is

0.31

Activations Density 1.752%

what starts questions

what

question words

No Comments

No Known Activations

what starts questions

what

question words

No Comments

No Known Activations