Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

code comments and declarations

np_acts-logits-general · gemini-2.5-flash-lite

New Auto-Interp

Configuration

google/gemma-scope-27b-pt-res/layer_34/width_131k

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

>>>

-1.01

-0.96

“

-0.96

-0.94

‘

-0.88

 складу

-0.88

 Гар

-0.82

 había

-0.81

伊豆

-0.81

 envision

-0.81

POSITIVE LOGITS

……"

1.15

…"

1.12

-//

1.01

()"

1.00

ꦠ

0.96

 ..."

0.96

!"

0.96

=="

0.95

orsing

0.95

amate

0.94

Activations Density 0.085%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact