Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

divide

np_max-act · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-gpt-oss-20b/resid_post_layer_11/trainer_0

Dataset (Dashboard)

Various

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 adults

-0.08

 geschlossen

-0.08

 receptionist

-0.08

植

-0.08

成人

-0.07

 plantar

-0.07

JAN

-0.07

 newer

-0.07

 adult

-0.07

 swear

-0.07

POSITIVE LOGITS

 denominator

0.10

 denomin

0.09

excluded

0.08

 precar

0.08

onneur

0.08

 precautions

0.08

 delicate

0.08

 avoided

0.08

iable

0.08

icen

0.08

Activations Density 0.022%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact