INDEX

Explanations

discouraged or prohibited

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 incapa

-0.86

⌞

-0.85

Gez

-0.85

ึก

-0.85

衩

-0.84

 predictions

-0.82

桧

-0.81

 sulfu

-0.81

sibly

-0.79

 impot

-0.79

POSITIVE LOGITS

 frowned

3.56

 discouraged

3.02

 prohibited

2.41

 forbidden

2.34

 taboo

2.14

 discourage

2.03

 frown

2.02

forbidden

1.85

 disapproved

1.80

 banned

1.79

Activations Density 0.118%