INDEX

Explanations

undesirable states or qualities

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ビーフ

-0.81

ツヤ

-0.79

igli

-0.76

 heuristic

-0.76

 silencing

-0.75

Mv

-0.75

 metrics

-0.75

 picante

-0.75

ברים

-0.74

 corrosive

-0.74

POSITIVE LOGITS

 orange

1.03

 unsightly

0.96

 unwanted

0.88

 bulky

0.88

 frizz

0.85

umpy

0.83

 pill

0.81

 weird

0.81

orange

0.80

 undesirable

0.80

Activations Density 0.030%