INDEX

Explanations

distinct concepts and actions

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 méridionale

0.29

 dikdört

0.28

 Toolbar

0.28

 ukulele

0.28

ωτερ

0.28

㬹

0.27

 Updater

0.26

 onderwerp

0.26

 તરી

0.26

 తిన

0.26

POSITIVE LOGITS

 wodurch

0.36

 mantiene

0.34

 evitando

0.34

reduce

0.33

 menghindari

0.33

 zaidi

0.33

 processo

0.32

 reduces

0.30

 avoiding

0.30

ಪ

0.30

Activations Density 0.199%