Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

sw + common suffix

np_acts-logits-general · gemini-2.5-flash-lite

New Auto-Interp

Configuration

google/gemma-scope-27b-pt-res/layer_22/width_131k

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

-3.22

The

-2.61

-2.34

-2.34

ものに

-2.34

al

-2.28

和你

-2.27

⸫

-2.23

-2.22

-2.19

POSITIVE LOGITS

 GenerationType

2.55

locene

2.53

ribune

2.48

uchos

2.47

 情侣

2.47

LLocation

2.42

炱

2.39

thouses

2.38

ASTIC

2.34

MessageTagHelper

2.34

Activations Density 0.029%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact