INDEX

Explanations

model refusalsmodel refusalsmodel refusalsmodel refusalmodel speakingmodel outputmodel outputmodel outputmodel outputmodel outputmodel speakingmodel outputmodel speaking

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 KMnO

0.41

顛

0.40

Occ

0.38

occ

0.36

 khấu

0.36

 निशान

0.35

 insinu

0.35

Locks

0.35

 vistazo

0.34

રો

0.34

POSITIVE LOGITS

 Кан

0.39

Dan

0.39

 Edel

0.39

郏

0.38

 weiterhin

0.37

 Danh

0.37

ভালো

0.37

anch

0.37

 Angelo

0.37

ajax

0.36

Activations Density 0.050%