INDEX

Explanations

lists following specific keywords

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 violating

0.45

 complying

0.44

 justifies

0.43

 setup

0.42

 office

0.41

 justifying

0.39

５

0.38

 infringing

0.38

 responsive

0.38

🆕

0.38

POSITIVE LOGITS

 великолеп

0.52

 tcpHeader

0.46

 idxf

0.45

 heartily

0.45

 maravilh

0.44

 хорошо

0.44

 القلب

0.44

豐富

0.43

 rzeczy

0.43

Mox

0.43

Activations Density 0.004%