INDEX

Explanations

prefix indicating negation or negativity

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

用

0.35

非常

0.34

和

0.33

 большая

0.33

上

0.33

 повністю

0.32

の

0.31

 варианты

0.31

日

0.30

到

0.30

POSITIVE LOGITS

 disagreeable

0.31

ot

0.28

volence

0.28

matory

0.27

emia

0.27

iosity

0.27

otoxin

0.27

 dyspe

0.26

aclysm

0.26

Activations Density 0.419%