INDEX

Explanations

terms related to consequences and penalties for wrongdoings

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

áºł

-0.07

SCO

-0.07

ucha

-0.07

nav

-0.06

 somehow

-0.06

igham

-0.06

otor

-0.06

kili

-0.06

ONGO

-0.06

ighbors

-0.06

POSITIVE LOGITS

 permanently

0.09

Î´Î·

0.08

 suspension

0.07

ban

0.07

 permanent

0.07

 temporarily

0.07

roker

0.07

 perman

0.07

 loss

0.06

 Bans

0.06

Activations Density 0.013%