Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

"between a" or "to a"

np_acts-logits-general · gemini-2.5-flash-lite

New Auto-Interp

Configuration

google/gemma-scope-2-4b-it/transcoder_all/layer_5_width_262k_l0_small_affine

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ுங்கள்

0.33

ｪ

0.33

 etiam

0.26

 से

0.26

ത്തിലും

0.26

氓

0.26

학과

0.26

 Ayrıca

0.26

학

0.26

 refund

0.25

POSITIVE LOGITS

taining

0.35

tained

0.33

the

0.32

drawn

0.31

ፍተኛ

0.30

transformed

0.30

theless

0.29

dut

0.29

 dépens

0.29

 menschen

0.28

Activations Density 0.406%

No Known Activations

© Neuronpedia 2026

Privacy & Terms Blog GitHub Slack Twitter Contact