Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

a definition

np_acts-logits-general · gemini-2.5-flash-lite

New Auto-Interp

Configuration

google/gemma-scope-27b-pt-res/layer_22/width_131k

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

-2.69

-2.59

-2.52

-2.42

-2.38

-2.38

-2.31

-2.22

and

-2.17

-2.08

POSITIVE LOGITS

3.25

not

2.94

now

2.52

 also

2.11

 Which

2.05

an

2.03

 usually

1.98

 interpreta

1.98

 morfo

1.94

 just

1.93

Activations Density 0.114%

No Known Activations

© Neuronpedia 2026

Privacy & Terms Blog GitHub Slack Twitter Contact