Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

commas

np_max-act-logits · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-gpt-oss-20b/resid_post_layer_15/trainer_0

Dataset (Dashboard)

Various

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

Kor

-0.09

 ಯಾವುದೇ

-0.08

 कुनै

-0.08

dan

-0.08

 рад

-0.08

 한국

-0.08

\">\

-0.08

 नेपाल

-0.08

 любого

-0.07

 Nelson

-0.07

POSITIVE LOGITS

etc

0.14

...),

0.11

,etc

0.11

等等

0.10

 וכו

0.10

 વગેરે

0.10

usw

0.10

 ...↵↵

0.10

vs

0.10

 ....↵↵

0.10

Activations Density 0.058%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact