INDEX

Explanations

negative phrases or concepts related to failure and dissatisfaction

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

Decoration

-0.08

ãĤ¥

-0.07

 eskort

-0.07

thro

-0.07

boru

-0.07

thouse

-0.07

OMIC

-0.07

ãĥ¼ãĥ

-0.07

MOOTH

-0.07

quared

-0.07

POSITIVE LOGITS

olson

0.06

 casc

0.06

 prest

0.06

kel

0.06

of

0.06

 Bauer

0.06

Pav

0.06

ogan

0.06

 Alter

0.05

HING

0.05

Activations Density 0.014%