INDEX

Explanations

are sexually suggestive

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

Art

0.44

 فول

0.36

Art

0.35

 Timor

0.35

Energie

0.35

dm

0.34

CRETE

0.34

湿

0.34

НИЕ

0.34

ဘူး

0.33

POSITIVE LOGITS

GPT

0.38

 sexism

0.37

GPT

0.36

 অবশ্যই

0.36

rency

0.35

 Judd

0.35

GBT

0.34

MANY

0.34

 Spoiler

0.34

mi

0.34

Activations Density 0.022%