INDEX

Explanations

incentive design, copyright notices, lists

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Stands

-0.69

⸝

-0.65

 ECONOMIC

-0.64

共产党

-0.64

ದು

-0.63

 中村

-0.63

大约

-0.62

deterministic

-0.61

 STAND

-0.60

 Bumi

-0.60

POSITIVE LOGITS

 kapas

0.83

Pvt

0.68

 Maud

0.68

扁平

0.67

MK

0.66

幸

0.66

 mechanistic

0.65

nologue

0.65

 climb

0.64

MACH

0.64

Activations Density 0.101%