Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

AI safety

np_max-act · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-gpt-oss-20b/resid_post_layer_11/trainer_0

Dataset (Dashboard)

Various

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 구매

-0.09

 buyer

-0.09

 underline

-0.08

 compras

-0.08

 underscore

-0.08

 blem

-0.08

 strike

-0.08

 olive

-0.08

 purchases

-0.07

BUY

-0.07

POSITIVE LOGITS

GPT

0.12

GPT

0.12

伦理

0.10

 dangerously

0.10

 perigos

0.10

 epistem

0.10

 dangers

0.10

 cognitive

0.10

 dangerous

0.10

危险

0.10

Activations Density 0.020%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact