Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

feel

np_max-act · gemini-2.0-flash

New Auto-Interp

Configuration

Prompts (Dashboard)

16,384 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Majefty

-0.92

ſelves

-0.91

 myſelf

-0.90

Efq

-0.88

 Jefus

-0.87

 itſelf

-0.83

 uſed

-0.79

ſelf

-0.79

 Roskov

-0.79

InitVars

-0.78

POSITIVE LOGITS

you

0.71

 like

0.57

it

0.57

0.56

Datuak

0.56

obod

0.51

 hlad

0.49

<td>

0.48

no

0.47

 their

0.47

Activations Density 0.067%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact