Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

also

np_max-act · gemini-2.0-flash

New Auto-Interp

Configuration

Prompts (Dashboard)

16,384 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ſelves

-0.98

 pleaſure

-0.97

 Diſ

-0.96

 Theſe

-0.96

 ſche

-0.94

 Monfieur

-0.94

 itſelf

-0.94

 Anſ

-0.92

QMetaType

-0.91

ſelf

-0.91

POSITIVE LOGITS

the

0.74

<bos>

0.63

 when

0.55

at

0.54

re

0.54

see

0.52

as

0.51

 though

0.50

 went

0.50

 from

0.49

Activations Density 0.168%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact