Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

save the code

np_acts-logits-general · gemini-2.5-flash-lite

New Auto-Interp

Configuration

google/gemma-scope-2-27b-it/resid_post/layer_40_width_262k_l0_medium

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 offrir

0.41

Genre

0.41

прос

0.39

 moan

0.38

VIDED

0.37

 filtro

0.37

 vostre

0.37

氛

0.37

Genres

0.36

 तुमचे

0.36

POSITIVE LOGITS

 Please

0.63

 please

0.55

请

0.50

Please

0.49

please

0.47

請

0.46

 Simply

0.43

請

0.41

 कृपया

0.41

 Bitte

0.40

Activations Density 0.005%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact