INDEX

Explanations

Medical/Technical issues

New Auto-Interp

Configuration

Prompts (Dashboard)

16,384 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Safe

-0.58

 risk

-0.52

 warning

-0.51

Safe

-0.51

 protection

-0.49

 dangerous

-0.48

 safety

-0.48

 warn

-0.46

 danger

-0.46

afety

-0.45

POSITIVE LOGITS

WithIOException

0.84

 informée

0.83

 Reſ

0.82

 Anſ

0.80

 kasarigan

0.79

 Theſe

0.79

 themſelves

0.78

$_"

0.78

NameInMap

0.77

 Jefus

0.77

Activations Density 0.015%