Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

No Explanations Found

New Auto-Interp

Configuration

Prompts (Dashboard)

16,384 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

重

-0.52

en

-0.45

str

-0.44

-0.42

kin

-0.42

cont

-0.42

kl

-0.41

xxx

-0.41

-0.41

al

-0.41

POSITIVE LOGITS

<bos>

1.33

Autoritní

0.98

 resourceCulture

0.90

 &___

0.89

الدراسه

0.87

UserScript

0.87

 تضيفلها

0.85

 autorytatywna

0.83

省市镇

0.82

 kasarigan

0.81

Activations Density 0.009%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact