Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

percentage of the time

np_max-act · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-gpt-oss-20b/resid_post_layer_11/trainer_0

Dataset (Dashboard)

Various

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 skjer

-0.08

ahanan

-0.08

 sounding

-0.08

 Grave

-0.08

 القي

-0.08

 חיים

-0.08

styr

-0.07

വ്

-0.07

vý

-0.07

 القدرة

-0.07

POSITIVE LOGITS

icamente

0.09

ifrån

0.09

 veces

0.09

ратно

0.09

 kerran

0.09

istically

0.08

ๆ

0.08

 allá

0.08

రకు

0.08

적으로

0.08

Activations Density 0.395%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact