Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

1

np_max-act-logits · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-gpt-oss-20b/resid_post_layer_15/trainer_0

Dataset (Dashboard)

Various

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

工作的

-0.09

性的

-0.08

Onde

-0.08

我是

-0.08

发展的

-0.08

on's

-0.08

爱的

-0.07

 való

-0.07

-0.07

amic

-0.07

POSITIVE LOGITS

 шту

0.13

 экземпля

0.10

 ได้แก่

0.10

，需要

0.10

 എണ്ണം

0.10

0.09

 ഉണ്ട

0.09

՝

0.09

，同比增长

0.09

 ခု

0.09

Activations Density 0.018%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact