Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

pass

np_max-act · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-gpt-oss-20b/resid_post_layer_11/trainer_0

Dataset (Dashboard)

Various

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

lud

-0.08

 supplemental

-0.08

quart

-0.08

 laughter

-0.07

 enkel

-0.07

مند

-0.07

相信

-0.07

 portefeuille

-0.07

 mindset

-0.07

 bouteille

-0.07

POSITIVE LOGITS

 noon

0.09

 Noon

0.08

 दुस

0.08

 지나

0.08

 जाण

0.08

 объ

0.08

ivate

0.07

 komt

0.07

 landmarks

0.07

 Callback

0.07

Activations Density 0.008%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact