INDEX

Explanations

No Explanations Found

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

就像

0.42

 $_{\

0.36

 Alain

0.36

 desal

0.36

wege

0.36

ორის

0.36

рах

0.34

찍

0.34

接

0.34

COCH

0.34

POSITIVE LOGITS

 truth

0.51

 definitive

0.50

 COMPLETE

0.50

 Disturb

0.47

 Average

0.46

basics

0.45

COMPLETE

0.45

 disturbing

0.45

 distinction

0.45

 conclusive

0.45

Activations Density 0.002%

No Known Activations