INDEX

Explanations

phrases indicating negative outcomes or conclusions

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

celik

-0.08

ysa

-0.08

 spender

-0.08

.appspot

-0.07

inkle

-0.07

 neod

-0.07

cheme

-0.07

µľ

-0.07

interop

-0.07

esi

-0.07

POSITIVE LOGITS

cul

0.07

0.06

 repeat

0.06

 compromise

0.06

 Stre

0.06

 complete

0.06

Twe

0.06

 disaster

0.06

 either

0.06

sel

0.06

Activations Density 0.014%