INDEX

Explanations

answer questions

sentences where the assistant asserts its role and safety/policy boundaries (self-description and refusal/explanation phrasing).

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

olio

0.71

hé

0.65

ल

0.64

тың

0.61

0.60

ن

0.59

But

0.57

kannya

0.57

Other

0.57

 Other

0.56

POSITIVE LOGITS

<unused2222>

0.71

 помощью

0.64

 medicamento

0.64

 आदान

0.61

 chatbots

0.61

 utilizado

0.60

 finalText

0.59

 역할을

0.59

 mediante

0.59

 seul

0.59

Activations Density 4.229%