Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

object methods

np_acts-logits-general · gemini-2.5-flash-lite

New Auto-Interp

Configuration

google/gemma-scope-27b-pt-res/layer_22/width_131k

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

例文

-0.89

機種

-0.88

がある

-0.87

 اطلاع

-0.85

RAINT

-0.83

 compagnies

-0.83

另一方面

-0.83

lá

-0.82

 parfum

-0.82

<bos>

-0.81

POSITIVE LOGITS

𖥸

1.11

conceito

1.07

 samtidig

1.02

attes

1.02

 ifølge

1.01

ᴴ

0.98

 klachten

0.98

 overeen

0.98

 dieną

0.98

さんから

0.97

Activations Density 0.024%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact