Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

as

np_max-act · gemini-2.0-flash

New Auto-Interp

Configuration

Prompts (Dashboard)

16,384 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

as

-1.88

 sebagai

-1.20

as

-0.89

 как

-0.88

 као

-0.85

作为

-0.83

 jako

-0.81

 الاطلاع

-0.77

作為

-0.77

 ως

-0.75

POSITIVE LOGITS

0.91

 well

0.80

the

0.71

an

0.71

pires

0.68

cribes

0.66

 follows

0.66

cription

0.61

 part

0.60

 opposed

0.60

Activations Density 0.364%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact