INDEX

Explanations

understand and explain

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

Normalization

-0.09

 assim

-0.09

 drum

-0.09

ught

-0.09

hei

-0.09

 Schro

-0.08

ensis

-0.08

-REAL

-0.08

 realism

-0.08

_mgmt

-0.08

POSITIVE LOGITS

 transparency

0.18

explain

0.17

 transparent

0.16

 explain

0.16

opaque

0.15

-transparent

0.15

 Explain

0.15

 opaque

0.14

éĢı

0.14

transparent

0.13

Activations Density 0.080%