Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

No Explanations Found

New Auto-Interp

Configuration

Prompts (Dashboard)

16,384 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Monfieur

-0.76

 Cæsar

-0.73

iconque

-0.69

 rabbi

-0.69

lujah

-0.68

 Urbano

-0.62

 Shakspeare

-0.60

 Moslem

-0.59

 constancy

-0.59

 Majefty

-0.59

POSITIVE LOGITS

<eos>

0.71

↵

0.70

0.59

abestanden

0.52

}{@

0.51

0.51

Datuak

0.51

setcounter

0.51

قایناق‌لار

0.49

↵↵↵

0.48

Activations Density 0.000%

No Known Activations

This feature has no known activations.

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact