Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

inner thoughts/feelings

np_max-act · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-gpt-oss-20b/resid_post_layer_11/trainer_0

Dataset (Dashboard)

Various

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

_CALL

-0.08

-duration

-0.08

-util

-0.07

making

-0.07

 Landsc

-0.07

-0.07

以来

-0.07

 landscapes

-0.07

 temporal

-0.07

_call

-0.07

POSITIVE LOGITS

 underneath

0.12

 underlying

0.10

 وراء

0.10

 sebenarnya

0.09

 daadwerk

0.09

 сути

0.09

と

0.09

 Somehow

0.09

 werkelijkheid

0.09

实际上

0.09

Activations Density 0.030%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact