Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

think

np_max-act · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-gpt-oss-20b/resid_post_layer_11/trainer_0

Dataset (Dashboard)

Various

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 ####

-0.07

 behaupt

-0.07

ιστα

-0.07

 செய்யப்பட்ட

-0.07

")

-0.07

.Assert

-0.07

Claims

-0.07

 mint

-0.07

illustr

-0.07

 motivate

-0.07

POSITIVE LOGITS

ว่าจะ

0.10

最佳

0.09

 optimal

0.08

 بهترین

0.08

 besar

0.08

 tentang

0.08

 về

0.08

 najleps

0.08

 mejores

0.08

 Optimal

0.08

Activations Density 0.170%

No Known Activations

© Neuronpedia 2026

Privacy & Terms Blog GitHub Slack Twitter Contact