INDEX

Explanations

expressions of decision-making and confidence

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

aji

-0.07

linger

-0.07

agner

-0.06

áº«

-0.06

enn

-0.06

 STRICT

-0.06

wig

-0.06

 WHATSOEVER

-0.06

WithURL

-0.06

POSITIVE LOGITS

 correct

0.09

 correctness

0.09

 decisions

0.08

OK

0.07

 direction

0.07

 Correct

0.07

correct

0.07

 justification

0.07

 choices

0.07

ok

0.06

Activations Density 0.036%