Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

signs of harm

np_acts-logits-general · gemini-2.5-flash-lite

New Auto-Interp

Configuration

google/gemma-scope-2-27b-it/resid_post/layer_31_width_262k_l0_medium

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

embe

0.44

 shame

0.41

ên

0.40

 platforms

0.40

パートナー

0.39

越

0.39

 metadata

0.39

 strukt

0.38

shade

0.38

 соответствует

0.37

POSITIVE LOGITS

μι

0.60

৷

0.47

rokken

0.47

ቝ

0.46

’

0.45

ۤ

0.45

 Україні

0.45

nvp

0.44

HasStarred

0.44

 Corbyn

0.44

Activations Density 0.000%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact