INDEX

Explanations

phrases related to justification and the necessity of explanations or reasons

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ernes

-0.08

á»ĵ

-0.07

/effects

-0.07

urr

-0.07

ffects

-0.07

.actions

-0.07

Verb

-0.07

eko

-0.07

Leaf

-0.07

 Ø§Ø¹ØªÙħØ§Ø¯

-0.07

POSITIVE LOGITS

 justify

0.14

 justification

0.13

 reasons

0.12

 Reasons

0.11

 justified

0.11

 rationale

0.10

 reason

0.10

justify

0.10

reason

0.10

Reason

0.10

Activations Density 0.022%