Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

programming concepts

np_acts-logits-general · gemini-2.5-flash-lite

New Auto-Interp

Configuration

google/gemma-scope-2-27b-it/resid_post/layer_40_width_262k_l0_medium

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 vaše

0.45

-'].

0.44

 என்பவர்

0.43

ისა

0.42

 이상의

0.42

으로서

0.42

 آنچه

0.42

яў

0.40

Dieser

0.40

urier

0.39

POSITIVE LOGITS

↵

0.72

‬

0.56

 using

0.55

=)

0.47

 👇

0.46

 before

0.46

 avoiding

0.46

-->

0.45

 without

0.45

*/}

0.45

Activations Density 0.035%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact