Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

Reagan

np_max-act · gemini-2.0-flash

New Auto-Interp

Configuration

Prompts (Dashboard)

16,384 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 pleaſure

-1.52

 myſelf

-1.48

 purpoſe

-1.48

 Anſ

-1.47

 Monfieur

-1.46

 ſeveral

-1.40

 reaſon

-1.38

 ſtate

-1.37

 itſelf

-1.36

 Reſ

-1.34

POSITIVE LOGITS

or

0.87

for

0.84

and

0.84

dem

0.84

0.82

0.81

vol

0.81

in

0.80

an

0.80

0.80

Activations Density 0.071%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact