INDEX

Explanations

trains and stations

The neuron fires strongly whenever it sees the word “train” (including in phrases like “train wreck” or “train of action potentials”), i.e. it detects mentions of “train.”

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 waarschijnlijk

-0.84

ettbewerb

-0.82

늙

-0.81

ipro

-0.80

donde

-0.80

obrázek

-0.79

渃

-0.79

zdjęcie

-0.79

 pabrik

-0.79

 moeilijk

-0.78

POSITIVE LOGITS

 train

1.70

 Train

1.52

 trains

1.32

Train

1.25

TRAIN

1.20

列車

1.20

 TRAIN

1.20

wreck

1.13

 wreck

1.11

 Trains

1.10

Activations Density 0.013%