INDEX

Explanations

describing unpleasant qualities

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 delicioso

-1.00

itement

-0.98

 مفید

-0.97

moderate

-0.97

ņš

-0.95

 nien

-0.94

 miſ

-0.93

mability

-0.93

 masas

-0.93

皷

-0.93

POSITIVE LOGITS

 oily

1.29

 sticky

1.05

 noisy

1.05

 buggy

1.00

有很多

0.95

 woody

0.94

 sloppy

0.90

has

0.90

 windy

0.89

 foggy

0.89

Activations Density 0.052%