INDEX

Explanations

explaining refusal or reporting

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 ஹீரோ

0.42

ളി

0.38

 Blue

0.37

ähr

0.36

 обслужи

0.36

 사용하여

0.34

<unused2222>

0.33

⑷

0.33

Blue

0.33

 আমর

0.33

POSITIVE LOGITS

mentioned

0.63

 mentioned

0.58

indeed

0.57

 indeed

0.57

 mencionado

0.52

確かに

0.51

とのこと

0.50

 mencion

0.49

 mencionados

0.49

 erwäh

0.48

Activations Density 0.518%