INDEX

Explanations

names or stereotypes

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Świę

0.62

 Sungai

0.60

 Grün

0.59

 asistentes

0.57

Ż

0.56

 Lebens

0.56

 آه

0.55

Deposito

0.55

Պ

0.55

Ż

0.55

POSITIVE LOGITS

 cliche

0.80

 clichés

0.77

 Rector

0.75

 cliché

0.75

 stereotypical

0.72

 stereotypes

0.71

 McDaniel

0.70

 stereotype

0.68

 Byrd

0.66

 stereotyp

0.65

Activations Density 0.139%