Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

would make

np_acts-logits-general · gemini-2.5-flash-lite

New Auto-Interp

Configuration

google/gemma-scope-27b-pt-res/layer_34/width_131k

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

volves

-0.89

helping

-0.86

┠

-0.85

 Allows

-0.82

ebly

-0.82

giving

-0.80

iczna

-0.77

 tiveram

-0.77

emm

-0.76

そう

-0.76

POSITIVE LOGITS

 make

4.41

 makes

4.38

 making

3.06

Make

3.05

make

3.05

 Makes

3.03

Makes

2.97

 Make

2.86

makes

2.77

MAKE

2.31

Activations Density 0.035%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact