INDEX

Explanations

model starting response with okay

the start of the assistant/model’s reply, i.e., the first content token right after the model turn marker.

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 occur

0.28

 compds

0.28

 axisymmetric

0.28

 acyl

0.28

 deliverables

0.28

 denomin

0.27

 aberr

0.27

 droplets

0.27

ྭ

0.27

 axially

0.27

POSITIVE LOGITS

Okay

0.66

Oh

0.64

 Okay

0.63

Hmm

0.63

Ah

0.62

Wow

0.62

Ah

0.60

Oh

0.60

Hmm

0.60

 okay

0.56

Activations Density 0.050%