Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

clinic

np_max-act · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-gpt-oss-20b/resid_post_layer_3/trainer_0

Dataset (Dashboard)

Various

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 appropr

-0.09

NAM

-0.08

 прибор

-0.08

٨

-0.08

ക്സ

-0.08

Proposal

-0.08

 المناسبة

-0.08

ട്ട

-0.08

റ്റ്

-0.07

jian

-0.07

POSITIVE LOGITS

 Ruhe

0.08

 rodz

0.07

 gedrag

0.07

	display

0.07

 vicious

0.07

 rote

0.07

 Commun

0.07

	trans

0.07

 difficultés

0.07

 habitudes

0.07

Activations Density 0.003%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact