INDEX

Explanations

searching HTML tables for phrases

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Dorm

-0.12

 bald

-0.09

 Gender

-0.09

 Beaut

-0.09

esian

-0.09

cal

-0.09

 Garland

-0.09

 genders

-0.09

inka

-0.08

 sage

-0.08

POSITIVE LOGITS

 needle

0.17

 Needle

0.14

needle

0.13

 needles

0.12

 Spotlight

0.11

 kiáº¿m

0.11

thal

0.11

olec

0.10

 patterns

0.10

grep

0.10

Activations Density 0.097%