Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

circuits and faults

np_max-act · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-gpt-oss-20b/resid_post_layer_11/trainer_0

Dataset (Dashboard)

Various

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 ausgezeichnet

-0.08

bos

-0.07

 hormon

-0.07

 mastering

-0.07

.annotations

-0.07

 \↵

-0.07

.educ

-0.07

_caption

-0.07

либ

-0.07

 DONE

-0.07

POSITIVE LOGITS

 ആക്രമ

0.10

 outage

0.10

 दुर्घ

0.09

事故

0.09

 Attack

0.09

 catastrophic

0.09

 potência

0.08

 авар

0.08

 escalation

0.08

 Unfall

0.08

Activations Density 0.004%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact