Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

growth and decline

np_max-act · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-gpt-oss-20b/resid_post_layer_11/trainer_0

Dataset (Dashboard)

Various

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

.inner

-0.08

Situation

-0.08

XY

-0.07

actions

-0.07

 viande

-0.07

פע

-0.07

Empire

-0.07

 situação

-0.07

 actions

-0.07

 directives

-0.07

POSITIVE LOGITS

 peacefully

0.09

 kakhulu

0.09

 kanjani

0.08

然

0.08

 sharply

0.08

 فيه

0.08

 lentamente

0.08

 sõlt

0.08

 ilalim

0.07

Seal

0.07

Activations Density 0.318%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact