INDEX

Explanations

truth and fully

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ifest

-0.09

ised

-0.09

à¤¾à¤µà¤°

-0.09

ohen

-0.09

gree

-0.09

gage

-0.08

 freezing

-0.08

'gc

-0.08

 folk

-0.08

ago

-0.08

POSITIVE LOGITS

fully

0.22

fulness

0.20

iness

0.17

/false

0.15

ayers

0.14

ulence

0.13

 serum

0.13

 Serum

0.12

.assertThat

0.12

ulent

0.12

Activations Density 0.018%