INDEX

Explanations

phrases indicating discrepancies or differences in outcomes

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

-0.06

 Sala

-0.06

aiser

-0.06

czy

-0.06

ades

-0.06

cre

-0.06

 dangling

-0.06

Ð¾ÑĤÑĢÐµÐ±

-0.06

Ð¾ÑģÑĤ

-0.06

sg

-0.06

POSITIVE LOGITS

shadow

0.07

olson

0.07

achen

0.07

Ø³Ø§ÙĨÛĮ

0.07

ãģ£ãģı

0.07

Ø§Ø³Ø§ÙĨ

0.07

outu

0.07

iffs

0.07

../../../../

0.07

phem

0.07

Activations Density 0.000%