INDEX

Explanations

spam, abusive, or offensive content

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 meis

-1.01

 laziness

-1.00

 goku

-0.95

jep

-0.91

 nerds

-0.91

 suzuki

-0.90

 itali

-0.90

 labrador

-0.90

 vasco

-0.90

 versace

-0.89

POSITIVE LOGITS

 spam

2.33

 hate

1.97

 racist

1.81

 abusive

1.80

 malicious

1.79

 offensive

1.75

 harmful

1.72

spam

1.70

 porn

1.68

bad

1.66

Activations Density 0.111%