INDEX

Explanations

secure and maintain control

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 restraint

-0.10

 amel

-0.10

stor

-0.10

Gim

-0.10

 restrained

-0.10

 intercept

-0.10

 ampl

-0.10

 elabor

-0.10

emb

-0.09

 retali

-0.09

POSITIVE LOGITS

 secure

0.17

 shore

0.15

secure

0.15

 legit

0.14

Secure

0.13

 Secure

0.12

ç¶Ń

0.12

 solid

0.12

å·

0.12

ç»´

0.12

Activations Density 0.056%