Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

say "mod"

np_max-act-logits · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-gpt-oss-20b/resid_post_layer_15/trainer_0

Dataset (Dashboard)

Various

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

adat

-0.08

，实现

-0.07

 analog

-0.07

ot

-0.07

det

-0.07

Trad

-0.07

 trilogy

-0.07

.replace

-0.07

.set

-0.07

 éton

-0.07

POSITIVE LOGITS

 Keine

0.11

 Weitere

0.10

 અત્યાર

0.10

 противопоказ

0.09

查看更多

0.09

 ఇత

0.09

 Пока

0.09

 الأخير

0.09

搜

0.09

 kleinere

0.09

Activations Density 0.003%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact