Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

model responses

np_acts-logits-general · gemini-2.5-flash-lite

New Auto-Interp

Configuration

google/gemma-scope-2-4b-it/resid_post/layer_22_width_65k_l0_medium

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

kao

1.00

 strange

0.93

 pang

0.91

 specifica

0.90

 weird

0.90

 activo

0.87

 perplex

0.86

kwa

0.86

ⵅ

0.85

 mulig

0.84

POSITIVE LOGITS

$\

1.49

\#

1.41

\[

1.35

$$\

1.31

$\

1.25

\-

1.24

1.23

\|

1.23

\|\

1.16

$\$

1.11

Activations Density 0.075%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact