INDEX

Explanations

ethical boundaries, explicit content, or translations

This neuron isn’t actually picking out any coherent words or linguistic patterns but rather spikes on stray formatting or parsing artifacts (those few tokens with abnormally large activation values), indicating it responds to occasional tokenization glitches rather than real text content.

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

sun

0.77

悌

0.75

өр

0.74

忙

0.73

ﻖ

0.73

ﻪ

0.72

 olen

0.71

 moja

0.70

scoped

0.70

ﬃ

0.70

POSITIVE LOGITS

 faisant

0.84

ността

0.80

 हाउ

0.79

 nettoy

0.77

itism

0.77

 couvrir

0.76

earn

0.75

damian

0.73

 conç

0.73

 rougeâtres

0.71

Activations Density 0.000%