Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

:

np_max-act-logits · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-gpt-oss-20b/resid_post_layer_15/trainer_0

Dataset (Dashboard)

Various

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

.",

-0.09

”).↵↵

-0.09

.",↵

-0.08

”.↵↵

-0.08

.",
↵

-0.08

 Lastly

-0.08

.”↵↵

-0.08

”).

-0.08

 dialogue

-0.08

."),↵

-0.08

POSITIVE LOGITS

↵

0.10

 প্রক

0.08

 Traditionally

0.08

 doorga

0.08

 typically

0.08

↵

0.08

%↵

0.08

 Typically

0.08

 presumably

0.08

：↵

0.08

Activations Density 0.110%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact