INDEX

Explanations

negative statements and the concept of impossibility

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

âĢ¦)↵↵

-0.08

allon

-0.08

umd

-0.08

ransition

-0.07

="__

-0.07

_mC

-0.07

æ®Ĭ

-0.07

à¸Ļà¸§à¸Ļ

-0.07

nung

-0.07

ÐºÐ¾Ð´

-0.07

POSITIVE LOGITS

 fail

0.07

 deny

0.06

 harm

0.06

 fails

0.06

 miss

0.06

 Cotton

0.06

 ignore

0.06

 down

0.06

not

0.06

 question

0.06

Activations Density 0.027%