INDEX

Explanations

references to casualties or loss of life

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ÏĥÏĦÎ±

-0.07

ãģ¥

-0.07

imli

-0.07

 armed

-0.07

igham

-0.07

ierz

-0.07

mam

-0.07

ãĥ³ãĥĦ

-0.07

 Sachs

-0.07

æ¨

-0.07

POSITIVE LOGITS

 losses

0.10

 loss

0.09

loss

0.09

 Loss

0.08

Loss

0.07

 toll

0.07

Cas

0.07

cas

0.07

 attr

0.07

 injury

0.06

Activations Density 0.052%