INDEX

Explanations

avoids negative or excessive outputs

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 unmistakable

0.17

 konnten

0.15

three

0.15

 pourront

0.15

都能

0.15

 four

0.14

 adjacent

0.14

neath

0.14

POSITIVE LOGITS

 needlessly

0.27

 unnecessarily

0.22

 unduly

0.21

 risking

0.21

 endangering

0.21

那么多

0.21

 misuse

0.21

 blindly

0.21

 столь

0.20

 jeopard

0.20

Activations Density 5.888%