INDEX

Explanations

ironic or unexpected states

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

لم

0.33

ار

0.33

Aplic

0.32

Latest

0.32

ujourd

0.32

ור

0.32

컴

0.32

ണ്ണ

0.31

Я

0.31

۰

0.31

POSITIVE LOGITS

 ironically

0.41

 fraudulently

0.40

 symbolically

0.40

 Ironically

0.39

 ironic

0.37

 특히

0.36

 mediates

0.36

 oddly

0.36

 ủng

0.36

 conceivably

0.36

Activations Density 0.159%