INDEX

Explanations

crafting malicious prompts

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

big

0.38

 Shao

0.38

rrrr

0.38

 plan

0.38

"],[

0.38

shtml

0.38

ونکي

0.38

 Pedal

0.38

?!?!

0.37

tted

0.37

POSITIVE LOGITS

 vessel

0.55

 vessels

0.52

Cloud

0.51

 сосу

0.47

 Cloud

0.45

 cloud

0.45

云

0.44

 Vessel

0.44

 vess

0.43

容器

0.40

Activations Density 0.001%