INDEX

Explanations

phrases indicating the concept of reasons or justification

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Brend

-0.07

ijd

-0.07

ral

-0.07

imet

-0.06

Ð»Ð¾Ð¿

-0.06

orman

-0.06

aci

-0.06

ä¸¾

-0.06

IFORM

-0.06

imat

-0.06

POSITIVE LOGITS

 because

0.12

 obvious

0.12

because

0.12

 reasons

0.11

 Because

0.10

 porque

0.10

Because

0.10

 omdat

0.10

ecause

0.09

ï¼ĮåĽłä¸º

0.09

Activations Density 0.028%