Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

mathematics

np_max-act-logits · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-gpt-oss-20b/resid_post_layer_15/trainer_0

Dataset (Dashboard)

Various

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 //----------------------------------------------------------------

-0.09

 //{↵

-0.08

ệu

-0.08

 वर्षीय

-0.08

дання

-0.08

त्न

-0.08

сті

-0.08

benhavn

-0.07

ocz

-0.07

！！↵↵

-0.07

POSITIVE LOGITS

属于

0.11

作为

0.11

属

0.11

 onderdeel

0.10

也是

0.10

 merupakan

0.10

(=

0.10

 guise

0.10

noun

0.10

subset

0.10

Activations Density 0.125%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact