Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

Default deny

np_max-act · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-gpt-oss-20b/resid_post_layer_11/trainer_0

Dataset (Dashboard)

Various

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 recognizable

-0.09

ണക്ക

-0.08

IMPORTANT

-0.08

ണ

-0.08

 담당

-0.08

human

-0.08

وند

-0.08

dik

-0.08

uman

-0.07

\Schema

-0.07

POSITIVE LOGITS

 positivo

0.08

yez

0.08

-ja

0.08

-positive

0.08

 voto

0.08

 yima

0.08

(children

0.07

 الطبيعي

0.07

_positive

0.07

_Default

0.07

Activations Density 0.002%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact