Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

end of time periods

np_acts-logits-general · gemini-2.5-flash-lite

New Auto-Interp

Configuration

google/gemma-scope-27b-pt-res/layer_22/width_131k

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

-1.76

３

-1.59

-1.55

人

-1.55

Ｄ

-1.54

 registró

-1.51

왑

-1.48

UCION

-1.47

Didn

-1.45

Doing

-1.42

POSITIVE LOGITS

the

1.82

簠

1.66

 freaking

1.48

袿

1.45

 their

1.43

 見える

1.41

 this

1.39

şiv

1.38

 大きい

1.38

FOREWORD

1.32

Activations Density 0.011%

No Known Activations

© Neuronpedia 2026

Privacy & Terms Blog GitHub Slack Twitter Contact