INDEX

Explanations

and despite

np_acts-logits-general · gemini-2.5-flash-lite

pronouns referring to a group

np_acts-logits-general · gemini-2.5-flash-lite

they

np_max-act-logits · claude-4-5-sonnet Triggered by @sripadkarne

New Auto-Interp

Configuration

google/gemma-scope-2-27b-pt/resid_post/layer_40_width_16k_l0_medium

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 আমরা

0.70

 हमने

0.68

 मैंने

0.60

我們先

0.60

我们就

0.58

 ನಾವು

0.58

 నేను

0.58

 ನಾನು

0.57

 우리는

0.57

我們會

0.57

POSITIVE LOGITS

他们

1.81

他們

1.75

They

1.73

they

1.72

 they

1.70

 они

1.70

 他们

1.68

 họ

1.67

 They

1.64

 вони

1.57

Activations Density 0.455%

and despite

pronouns referring to a group

they

No Comments

No Known Activations

and despite

pronouns referring to a group

they

No Comments

No Known Activations