INDEX

Explanations

references to undesirable or problematic elements

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 ExecuteAsync

-0.85

:✨

-0.75

tvguidetime

-0.74

 unwanted

-0.63

ssohn

-0.58

nachron

-0.57

stuffs

-0.57

 themſelves

-0.56

 Hano

-0.55

ſelves

-0.54

POSITIVE LOGITS

 undes

1.37

 coaches

1.04

 Coaches

1.00

Coaches

0.94

 homeowners

0.73

 CHtml

0.67

undes

0.67

 referenties

0.65

 aryl

0.65

+#+#

0.64

Activations Density 0.003%