INDEX

Explanations

names or prefixes followed by specific words

The neuron selectively activates on the initial subword pieces of longer, less common proper names or technical terms.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

for

-3.44

so

-3.34

has

-3.17

 after

-2.77

 what

-2.75

but

-2.61

 before

-2.56

 because

-2.56

 just

-2.56

and

-2.56

POSITIVE LOGITS

 trám

2.92

 refroid

2.59

 normalt

2.56

粜

2.56

躓

2.47

 rafraî

2.45

 régal

2.41

 alltid

2.31

 corriger

2.30

 eerste

2.28

Activations Density 0.704%