INDEX

Explanations

unhealthy or dangerous things

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

artige

0.45

ล

0.44

と

0.44

 ਅਤੇ

0.41

 Kommunikation

0.41

 mediating

0.40

ه

0.40

 Kyoto

0.40

 ამ

0.39

is

0.39

POSITIVE LOGITS

ásico

0.44

ácil

0.43

スナー

0.41

 satisf

0.39

ANT

0.39

 prerog

0.39

球员

0.39

🏖

0.38

 spree

0.38

inching

0.38

Activations Density 0.005%