Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

Code and layout elements

np_max-act · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-gpt-oss-20b/resid_post_layer_11/trainer_0

Dataset (Dashboard)

Various

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

тер

-0.09

 temporal

-0.08

 chronological

-0.08

 ಮಾರ

-0.08

ped

-0.08

 metaphor

-0.08

olon

-0.08

и

-0.07

 senator

-0.07

osi

-0.07

POSITIVE LOGITS

Fen

0.08

Kek

0.08

 ausgeschlossen

0.07

 restricciones

0.07

、自

0.07

(calc

0.07

 rappel

0.07

 Constraints

0.07

Restriction

0.07

ren

0.07

Activations Density 0.003%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact