INDEX

Explanations

exploit, exploiting, exploitation

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

𝐤

1.53

 insuff

1.35

ই

1.30

ために

1.27

チェ

1.26

 wody

1.26

ké

1.26

ल्ट

1.23

 groe

1.23

 fratt

1.22

POSITIVE LOGITS

е

1.59

 گیری

1.57

ｬ

1.46

⃣

1.40

х

1.40

⺈

1.38

檚

1.35

ላል

1.35

 vocal

1.35

Ɖ

1.32

Activations Density 0.001%