Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

a followed by a word

np_acts-logits-general · gemini-2.5-flash-lite

New Auto-Interp

Configuration

google/gemma-scope-27b-pt-res/layer_22/width_131k

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

it

-1.94

its

-1.48

 there

-1.30

 this

-0.98

ConfigService

-0.97

 when

-0.96

 zichzelf

-0.96

 इसकी

-0.96

piscina

-0.96

 betrekking

-0.95

POSITIVE LOGITS

 gese

1.04

rektor

1.03

 البته

1.03

 uLocal

0.99

的声音

0.96

0.96

quent

0.94

 Âge

0.94

irms

0.94

 geste

0.94

Activations Density 0.030%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact