INDEX

Explanations

TWD characters and Alexandria

This neuron marks the first few “content” words of a new sentence or block (e.g. the initial substantive words in a paragraph).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 caffeine

-0.78

 paws

-0.78

辄

-0.78

猥

-0.78

Ys

-0.76

 snug

-0.76

RIO

-0.75

beak

-0.75

lask

-0.74

Marble

-0.74

POSITIVE LOGITS

 walker

1.30

 walkers

1.29

 Alexand

1.11

 Alexandria

1.06

 communities

1.06

 Judith

1.04

 Whisper

0.96

 Communities

0.96

 Daryl

0.94

 Hers

0.93

Activations Density 0.007%