INDEX

Explanations

simple encoding method

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Decrypt

-0.19

Decrypt

-0.18

_decrypt

-0.16

 decrypted

-0.16

 decryption

-0.15

decrypt

-0.14

 decrypt

-0.14

.decrypt

-0.11

typ

-0.10

encrypted

-0.10

POSITIVE LOGITS

enc

0.22

 encoding

0.22

 coding

0.18

 schemes

0.18

 encode

0.18

 Encoding

0.17

ç¼ĸ

0.16

encoding

0.16

 ç¼ĸ

0.15

 compress

0.15

Activations Density 0.113%