INDEX

Explanations

specifying human values

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

腖

0.50

ASON

0.46

Hawaii

0.46

INGS

0.45

ংকা

0.44

丟

0.43

SendData

0.43

鉑

0.43

先前

0.43

addContainer

0.42

POSITIVE LOGITS

 bouts

0.47

 distraught

0.47

 inadvertently

0.46

 nonchal

0.44

to

0.43

 irrever

0.43

 indiscriminate

0.43

 leash

0.43

 confident

0.42

 could

0.42

Activations Density 0.013%