Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

start of week schedule

np_acts-logits-general · gemini-2.5-flash-lite

New Auto-Interp

Configuration

google/gemma-scope-2-4b-it/resid_post/layer_9_width_262k_l0_medium

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ם

2.85

 sted

2.70

党员

2.68

 tornar

2.68

п

2.66

จะ

2.66

ehr

2.64

ів

2.63

ър

2.62

 tornare

2.60

POSITIVE LOGITS

harth

3.13

ق

3.06

ज

2.96

ുള്ള

2.69

сал

2.69

ංශ

2.64

शहर

2.63

 skriv

2.61

ВА

2.60

Phương

2.59

Activations Density 0.015%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact