INDEX

Explanations

set of rules

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ampo

-0.09

utenberg

-0.09

 subclasses

-0.09

Ago

-0.09

IGHL

-0.09

 Literal

-0.08

itere

-0.08

acman

-0.08

 Gesture

-0.08

estroy

-0.08

POSITIVE LOGITS

 typing

0.15

 Hind

0.15

 type

0.15

 polym

0.15

/type

0.14

 Type

0.14

 Pierce

0.13

typing

0.13

Ty

0.13

Typ

0.12

Activations Density 0.051%