Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

a/an followed by a noun

np_acts-logits-general · gemini-2.5-flash-lite

New Auto-Interp

Configuration

google/gemma-scope-27b-pt-res/layer_10/width_131k

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

？？？

-1.93

 them

-1.72

 these

-1.71

↓

-1.65

ších

-1.61

 aand

-1.59

 there

-1.57

～～～

-1.51

俢

-1.49

 chande

-1.47

POSITIVE LOGITS

to

2.25

—

1.95

—

1.85

</h5>

1.76

 Another

1.76

",

1.70

–

1.69

 fevereiro

1.64

You

1.63

 usual

1.61

Activations Density 0.092%

No Known Activations

© Neuronpedia 2026

Privacy & Terms Blog GitHub Slack Twitter Contact