INDEX

Explanations

how someone behaves or responds

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 получаем

0.40

enders

0.39

割引

0.39

 получать

0.39

 получили

0.38

キ

0.38

 그거

0.38

찾

0.38

 받았

0.38

 शर्म

0.37

POSITIVE LOGITS

 whom

0.75

whom

0.68

 responds

0.57

 behaves

0.57

 respond

0.55

してくれる

0.55

 reciproc

0.54

 behaving

0.54

 behaved

0.52

 behave

0.51

Activations Density 0.022%