INDEX

Explanations

content moderation policies

New Auto-Interp

Configuration

Prompts (Dashboard)

16,384 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 planning

-0.26

èĦ¾èĥĥ

-0.26

 intimately

-0.25

logan

-0.25

 energ

-0.25

timing

-0.25

ç»ŁçŃ¹

-0.25

.timing

-0.25

 consultants

-0.24

ogi

-0.24

POSITIVE LOGITS

 spam

0.44

è¿Ŀè§Ħ

0.40

 flagged

0.40

 censorship

0.40

å®¡æł¸

0.38

spam

0.35

éªļæī°

0.34

åı¯çĸĳ

0.34

 banned

0.33

 suspicious

0.33

Activations Density 0.157%