Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

contains the word "английский"

np_max-act · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-gpt-oss-20b/resid_post_layer_11/trainer_0

Dataset (Dashboard)

Various

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

имого

-0.08

，其中

-0.08

龙江

-0.08

sul

-0.08

。此外

-0.08

。然而

-0.07

 修改

-0.07

 articulated

-0.07

 Frontier

-0.07

iedo

-0.07

POSITIVE LOGITS

 correcto

0.08

 castell

0.08

bä

0.08

portable

0.08

-remove

0.07

 correcta

0.07

 Consent

0.07

tum

0.07

 Amma

0.07

ppa

0.07

Activations Density 0.004%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact