Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

index followed by =, funds, out, names, or tracking

np_acts-logits-general · gemini-2.5-flash-lite

New Auto-Interp

Configuration

google/gemma-scope-2-12b-it/resid_post/layer_12_width_262k_l0_medium

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ص

2.19

с

1.88

ната

1.72

 EUROPE

1.69

ě

1.68

叭

1.68

 considéré

1.66

기

1.66

쏜

1.66

 چنان

1.62

POSITIVE LOGITS

dır

2.92

gew

2.23

gruppe

2.20

gruppen

2.14

gru

2.09

gp

1.96

gerät

1.91

𝙨

1.91

guo

1.90

𝙜

1.89

Activations Density 0.018%

No Known Activations

© Neuronpedia 2026

Privacy & Terms Blog GitHub Slack Twitter Contact