INDEX

Explanations

starts of descriptive phrases

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 believes

-0.97

 auxiliar

-0.96

 aeron

-0.96

 anum

-0.95

 diras

-0.93

믿

-0.91

 precau

-0.90

 September

-0.90

ských

-0.89

 believe

-0.88

POSITIVE LOGITS

is

1.19

 also

1.13

 acht

1.11

 també

1.09

can

0.99

 także

0.98

0.94

 cũng

0.92

 stück

0.92

 passend

0.90

Activations Density 0.000%