Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

external links or versions

np_acts-logits-general · gemini-2.5-flash-lite

New Auto-Interp

Configuration

google/gemma-scope-27b-pt-res/layer_22/width_131k

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

🫅

-2.88

埽

-2.84

筇

-2.50

圌

-2.50

糨

-2.38

י

-2.38

涠

-2.30

蔸

-2.28

 identifiés

-2.25

腘

-2.25

POSITIVE LOGITS

what

3.00

ୌ

2.67

 unparalleled

2.61

ization

2.58

2.58

 three

2.50

lossians

2.45

2.44

 tangible

2.42

 longtime

2.39

Activations Density 0.013%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact