Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

citations

np_max-act · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-gpt-oss-20b/resid_post_layer_11/trainer_0

Dataset (Dashboard)

Various

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 સાચ

-0.09

 handing

-0.08

 योग्य

-0.08

属于

-0.08

 projectile

-0.08

ిపోయ

-0.08

容易

-0.08

 nominal

-0.08

 radioactive

-0.08

Ј

-0.08

POSITIVE LOGITS

 Kumar

0.09

Liu

0.09

WHO

0.08

习近平

0.08

 Dimit

0.08

 Nagar

0.08

 SCORE

0.08

Xu

0.08

 Friedman

0.08

Fus

0.08

Activations Density 0.014%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact