INDEX

Explanations

preventing negative outcomes

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ONIA

-0.88

 légèrement

-0.87

お風呂

-0.84

 ſol

-0.83

Decay

-0.82

 Künst

-0.82

Descriere

-0.81

 сигнал

-0.81

 prode

-0.80

zado

-0.79

POSITIVE LOGITS

 premature

1.20

 excessive

1.16

 undesirable

1.09

 excessively

1.05

 problems

1.03

 undes

1.03

 damage

0.96

undes

0.94

 unwanted

0.89

 damaging

0.89

Activations Density 0.165%