INDEX

Explanations

words starting with cat or cath

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

Orb

-0.11

allee

-0.11

nts

-0.10

ORB

-0.09

lingen

-0.09

etta

-0.09

omat

-0.09

 sober

-0.09

abyrinth

-0.09

 Ø§Ø®ØªÛĮ

-0.09

POSITIVE LOGITS

Ã©gorie

0.16

olic

0.15

eter

0.14

Ð°Ð»Ð¾Ð³

0.14

walk

0.13

apult

0.13

pillar

0.13

amar

0.12

olicy

0.12

edral

0.12

Activations Density 0.031%