Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

internet security

np_max-act · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-gpt-oss-20b/resid_post_layer_11/trainer_0

Dataset (Dashboard)

Various

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

.Cursor

-0.09

 ощущения

-0.08

esom

-0.08

 weekly

-0.08

 Weekly

-0.08

 सप्ताह

-0.07

 הדברים

-0.07

 чаще

-0.07

 semanal

-0.07

 häufig

-0.07

POSITIVE LOGITS

 malicious

0.13

窥

0.11

 unauthorized

0.11

 зло

0.10

 anyone

0.10

偷窥

0.10

 someone

0.10

誰

0.10

Unauthorized

0.10

非法

0.10

Activations Density 0.024%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact