INDEX

Explanations

instances of admission or acknowledgment of mistakes and failures

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

Disposed

-0.07

Ø§Ø¨Øª

-0.07

ebek

-0.07

plet

-0.07

onth

-0.07

iev

-0.07

à¹Ĥà¸¥à¸ģ

-0.07

basePath

-0.06

à¤¾à¤¨à¤¨

-0.06

leh

-0.06

POSITIVE LOGITS

 defeat

0.15

 defeated

0.10

 defeats

0.09

 mistakes

0.09

 reality

0.09

 mistake

0.08

 ownership

0.08

 error

0.08

 admission

0.08

 wrongdoing

0.08

Activations Density 0.016%