Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

logical contrapositives

np_max-act · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-gpt-oss-20b/resid_post_layer_11/trainer_0

Dataset (Dashboard)

Various

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 pioneer

-0.09

 erforder

-0.08

 emuls

-0.07

 endgült

-0.07

acht

-0.07

 Valencia

-0.07

 Genel

-0.07

rug

-0.07

 Freelancer

-0.07

agu

-0.07

POSITIVE LOGITS

CWE

0.08

訳

0.08

 closures

0.08

 reversed

0.08

 closure

0.08

 alex

0.08

近平

0.08

 transpose

0.08

 verbs

0.08

 anton

0.08

Activations Density 0.013%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact