INDEX

Explanations

male names followed by surnames or titles

the presence of person names (named-entity tokens identifying people).

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 petals

0.36

 bruises

0.35

 Elektrokh

0.34

 Justiça

0.32

 actresses

0.32

 hazelnuts

0.32

 breasts

0.31

 Yatha

0.31

 ERROR

0.31

 SAMP

0.31

POSITIVE LOGITS

son

0.37

í

0.36

ina

0.35

ian

0.35

ů

0.34

िया

0.34

å

0.34

0.33

о

0.32

sson

0.32

Activations Density 0.078%