INDEX

Explanations

danger and harm to self

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

以及

0.43

などの

0.42

以外

0.41

그리고

0.40

および

0.40

 および

0.40

ือน

0.39

및

0.38

AND

0.38

 naro

0.38

POSITIVE LOGITS

 both

2.38

both

2.22

 både

2.05

Both

1.90

 zowel

1.90

 sowohl

1.88

 zarówno

1.88

 Both

1.86

 cả

1.82

ทั้ง

1.78

Activations Density 0.306%