Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

modified

np_acts-logits-general · gemini-2.5-flash-lite

New Auto-Interp

Configuration

google/gemma-scope-27b-pt-res/layer_22/width_131k

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

為に

-1.42

-1.27

 utaf

-1.22

マジで

-1.22

事で

-1.19

ようになります

-1.17

マイナス

-1.14

その他の

-1.13

景象

-1.12

達が

-1.10

POSITIVE LOGITS

！』

1.37

ЕР

1.34

。』

1.33

や

1.32

 Proses

1.30

tiet

1.29

dagogik

1.29

dü

1.25

 скриншот

1.25

beforeEach

1.24

Activations Density 0.291%

No Known Activations

© Neuronpedia 2026

Privacy & Terms Blog GitHub Slack Twitter Contact