INDEX

Explanations

distinguish, district, disturb, distribute

The neuron activates on tokens beginning with the letters “dist” (e.g. the “Dist” prefix in words like “Distinguish,” “District,” “Disturbing,” etc.).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

-3.64

-3.25

With

-3.22

Because

-2.92

The

-2.92

After

-2.91

-2.89

What

-2.88

When

-2.78

POSITIVE LOGITS

玙

3.08

🧌

3.03

ୌ

2.95

埽

2.83

苹

2.80

镭

2.73

brakk

2.72

we

2.72

 autre

2.69

蘼

2.69

Activations Density 0.004%