INDEX

Explanations

negative or limiting phrases

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

’

0.56

0.52

0.45

re

0.44

 representative

0.44

was

0.44

 الطب

0.43

Inc

0.43

ı

0.41

rm

0.40

POSITIVE LOGITS

letzt

0.50

geke

0.48

 attaque

0.48

 dépl

0.47

嶈

0.47

জেল

0.47

gekehrt

0.47

イタリア

0.47

リア

0.46

pozn

0.46

Activations Density 0.001%