INDEX

Explanations

heterosexual and straight orientations

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

cycline

-0.81

āciju

-0.74

újo

-0.74

kab

-0.73

ayashi

-0.71

enschappelijke

-0.69

partiet

-0.69

halten

-0.69

Mej

-0.69

veau

-0.68

POSITIVE LOGITS

 heterosexual

4.28

 straight

4.06

Straight

3.56

straight

3.47

 Straight

3.45

 heter

3.41

heter

3.17

 Hetero

3.11

 hetero

3.11

Hetero

2.77

Activations Density 0.064%