INDEX

Explanations

respect for all

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 ActionTypes

-0.10

 wreck

-0.09

ãĢĥ

-0.09

wre

-0.09

Sap

-0.09

 Brut

-0.09

;/*

-0.08

 serif

-0.08

zano

-0.08

);$

-0.08

POSITIVE LOGITS

forall

0.14

 everyone

0.13

 forall

0.12

 others

0.12

everyone

0.12

 towards

0.11

 vÅ¯Äįi

0.11

 toward

0.11

 bagi

0.10

 Ð²ÑģÐµÑħ

0.10

Activations Density 0.053%