INDEX

Explanations

harmful content refusals

The neuron is selectively activating on floating‐point numeric literals (numbers with a decimal component).

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

ului

0.68

habitants

0.67

ယ်

0.61

 ওই

0.60

گری

0.59

ان

0.58

 ____

0.58

کس

0.56

 गिरी

0.55

POSITIVE LOGITS

 sorts

1.20

TEN

0.83

 note

0.77

 dubious

0.72

necess

0.71

sort

0.71

 sorte

0.70

ছেন

0.70

ッチン

0.69

 Vorteil

0.69

Activations Density 0.411%