Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

southwest

np_max-act · gemini-2.0-flash

New Auto-Interp

Configuration

Prompts (Dashboard)

16,384 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 auroit

-1.04

 feroit

-0.98

Efq

-0.97

 avoient

-0.96

 Majefty

-0.96

 étoit

-0.94

 étoient

-0.94

 myſelf

-0.93

 sfeer

-0.92

 Landscape

-0.91

POSITIVE LOGITS

As

0.52

tk

0.49

Sel

0.48

De

0.47

Sa

0.47

 MotionEvent

0.46

 issue

0.46

<bos>

0.45

0.44

af

0.44

Activations Density 0.068%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact