INDEX

Explanations

circle and circles

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 cubes

-0.14

Cube

-0.12

 rectangles

-0.12

Cub

-0.12

 triangular

-0.11

 Pyramid

-0.11

 Cube

-0.11

 rectangular

-0.11

 pyramid

-0.11

berger

-0.11

POSITIVE LOGITS

 circle

0.51

 Circle

0.43

circle

0.39

 circles

0.38

 circular

0.38

Circle

0.38

-circle

0.34

.circle

0.34

åľĨ

0.33

 concent

0.31

Activations Density 0.112%