Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

hypothesis

np_max-act · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-gpt-oss-20b/resid_post_layer_11/trainer_0

Dataset (Dashboard)

Various

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 repent

-0.08

irge

-0.08

 ular

-0.08

 devote

-0.08

 perempuan

-0.08

CCI

-0.08

 минист

-0.08

оратив

-0.07

 eingerichtet

-0.07

 ома

-0.07

POSITIVE LOGITS

 hypotheses

0.15

 hypothesis

0.13

 hypoth

0.11

 לגבי

0.10

 regarding

0.10

 beliefs

0.09

 premises

0.09

 conject

0.09

 assumptions

0.09

 بشأن

0.09

Activations Density 0.015%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact