Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

matrix, transformation, structures

np_acts-logits-general · gemini-2.5-flash-lite

New Auto-Interp

Configuration

google/gemma-scope-2-27b-it/resid_post/layer_40_width_262k_l0_medium

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 समझाने

0.39

uba

0.37

 distracting

0.37

덜

0.36

 വള

0.35

hem

0.35

astava

0.35

ಹೆ

0.35

 endothelial

0.34

 aktiv

0.34

POSITIVE LOGITS

пит

0.45

 ат

0.44

it

0.43

gt

0.43

 itin

0.43

 جم

0.42

 jams

0.42

lit

0.41

ⱪ

0.41

 ит

0.40

Activations Density 0.000%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact