INDEX

Explanations

gender-neutral pronoun constructions

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Lucid

-0.88

㌔

-0.83

keydown

-0.82

 dynamo

-0.82

だいぶ

-0.81

Noo

-0.81

стюм

-0.80

 getS

-0.80

 arranger

-0.80

 erő

-0.80

POSITIVE LOGITS

her

1.01

He

0.91

or

0.90

has

0.87

her

0.87

 ordinaires

0.80

he

0.79

ster

0.79

 Stadt

0.79

 After

0.78

Activations Density 0.008%