INDEX

Explanations

model responses

the beginning of the assistant’s reply in a dialogue (the assistant turn marker or first token of the model’s message).

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 regressions

0.47

 opérations

0.46

 വിൽപ്പന

0.45

 variété

0.45

 auteurs

0.45

 radiographs

0.45

ÜR

0.44

鸮

0.44

 ফ্যাস

0.43

 collaborateurs

0.42

POSITIVE LOGITS

 sorry

0.69

sorry

0.69

plz

0.54

Sorry

0.49

 Sorry

0.49

hehe

0.49

pls

0.47

they

0.47

answer

0.46

you

0.46

Activations Density 0.007%