© Neuronpedia 2026

Privacy & Terms Blog GitHub Slack Twitter Contact

Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

Home
Gemma-3-1B
7-GEMMASCOPE-2-RES-16K
14528

INDEX

Explanations

self-harm or harm to others

np_acts-logits-general · gemini-2.5-flash-lite

New Auto-Interp

Top Features by Cosine Similarity

Configuration

google/gemma-scope-2-1b-pt/resid_post/layer_7_width_16k_l0_medium

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

No Configuration Found

Embeds

Show PlotsShow ExplanationShow ActivationsShow Test FieldShow SteerShow Link

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

('*

1.63

(',')

1.53

:<

1.51

தய

1.48

('')

1.46

 τῶν

1.46

Ort

1.42

curité

1.42

:(

1.42

 '-':

1.40

POSITIVE LOGITS

by

1.41

eine

1.27

ในปี

1.22

စ်

1.22

 sevent

1.18

twenty

1.17

 although

1.15

ปี

1.14

 HALF

1.14

 walaupun

1.13

Activations Density 0.145%

No Known Activations