© Neuronpedia 2026

Privacy & Terms Blog GitHub Slack Twitter Contact

Neuronpedia

Natural Language

NEW Assistant AxisNEW Circuit TracerUPDATESteer SAE Evals ExportsAPI Community Blog Privacy & Terms Contact

Home
Gemma-2-2B
0-CLT-HP
97364

INDEX

Explanations

which

np_max-act · gemini-2.0-flash

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

16,384 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

No Configuration Found

Embeds

Show PlotsShow ExplanationShow ActivationsShow Test FieldShow SteerShow Link

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 which

-1.26

which

-1.05

 WHICH

-0.97

 Which

-0.95

Which

-0.92

 mely

-0.70

hich

-0.68

quelles

-0.65

ซึ่ง

-0.62

 wich

-0.60

POSITIVE LOGITS

 have

0.85

are

0.84

has

0.72

 were

0.72

 Armenians

0.66

 owes

0.65

 Serbs

0.65

 serves

0.65

had

0.63

 gives

0.63

Activations Density 0.023%

No Known Activations