Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

world

np_max-act · gemini-2.0-flash

New Auto-Interp

Configuration

Prompts (Dashboard)

16,384 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 world

-1.38

world

-1.07

 WORLD

-1.06

 World

-1.01

World

-0.91

WORLD

-0.89

 세계

-0.89

^(@)

-0.84

 Welt

-0.82

 wereld

-0.82

POSITIVE LOGITS

ly

0.61

wide

0.58

ваемых

0.56

WIDE

0.53

ally

0.52

0.51

of

0.48

0.47

0.47

ingly

0.47

Activations Density 1.125%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact