Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

numbers

np_max-act · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-gpt-oss-20b/resid_post_layer_11/trainer_0

Dataset (Dashboard)

Various

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 баб

-0.08

 melding

-0.08

老太

-0.08

жди

-0.07

telling

-0.07

 прор

-0.07

 батар

-0.07

 Persönlichkeit

-0.07

 trustworthy

-0.07

 этап

-0.07

POSITIVE LOGITS

 approx

0.08

0.08

 humorous

0.08

 approximately

0.08

 يص

0.08

Cham

0.08

 Guitar

0.08

ully

0.07

_month

0.07

Anthony

0.07

Activations Density 0.122%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact