Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

Square root equations

np_max-act-logits · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-gpt-oss-20b/resid_post_layer_15/trainer_0

Dataset (Dashboard)

Various

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 fares

-0.09

 confines

-0.09

 stalls

-0.08

 enää

-0.08

 fare

-0.08

pens

-0.08

 مغ

-0.08

heart

-0.08

wolves

-0.08

 أع

-0.08

POSITIVE LOGITS

 positive

0.10

 atan

0.09

 positif

0.09

 positieve

0.09

 порядок

0.08

 Positive

0.08

 направление

0.08

 inverse

0.08

 অনুয

0.08

阳

0.08

Activations Density 0.007%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact