Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

Names and places

np_max-act · gemini-2.0-flash

New Auto-Interp

Configuration

Prompts (Dashboard)

16,384 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

Ro

-0.83

Sa

-0.73

Or

-0.67

Tr

-0.65

-0.63

-0.61

or

-0.60

Ho

-0.60

Bar

-0.58

-0.57

POSITIVE LOGITS

ſelves

1.10

 pleaſure

1.10

ſelf

1.07

 Monfieur

1.07

 myſelf

1.05

 Majefty

1.01

 purpoſe

1.00

 ſmall

1.00

 itſelf

0.99

 cauſe

0.97

Activations Density 0.156%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact