INDEX

Explanations

double negative, redundant phrases

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 rendelkez

0.49

 renewables

0.48

 conversions

0.45

<start_of_image>

0.45

 flexibility

0.44

 transitioned

0.44

 voldo

0.44

 synergies

0.43

 unil

0.43

 inkl

0.43

POSITIVE LOGITS

 poisonous

0.47

 worsen

0.47

nte

0.46

 ಸಮಸ್ಯ

0.45

ׁ

0.45

 poisoning

0.45

 ઘરે

0.44

 murderous

0.44

 божомол

0.43

cı

0.43

Activations Density 0.015%