INDEX

Explanations

attempts and observations

New Auto-Interp

Configuration

Prompts (Dashboard)

16,384 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 itſelf

-0.94

 Wikimedijinoj

-0.93

lably

-0.93

󠁴

-0.91

tubers

-0.90

 isInitialized

-0.89

 Савезне

-0.87

 pleaſure

-0.87

othed

-0.86

 ब्रेकडाउन

-0.86

POSITIVE LOGITS

multer

0.59

0.58

“

0.55

0.50

for

0.50

0.49

’

0.48

<eos>

0.47

0.46

Activations Density 0.082%