Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

Dialogue and gratitude

np_max-act · gemini-2.0-flash

New Auto-Interp

Configuration

andyrdt/saes-gpt-oss-20b/resid_post_layer_11/trainer_0

Dataset (Dashboard)

Various

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

_REGEX

-0.08

_SIG

-0.08

Unlike

-0.08

_Rect

-0.08

 Unlike

-0.08

்

-0.08

_START

-0.08

'arrêt

-0.08

 violently

-0.07

_AS

-0.07

POSITIVE LOGITS

 thank

0.13

 thanking

0.13

THANK

0.13

 THANK

0.12

 teşekkür

0.12

 gratitude

0.11

 നന്ദ

0.11

Спасибо

0.11

 спасибо

0.11

 תודה

0.11

Activations Density 0.029%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact