INDEX

Explanations

how to

tokens that are part of user instructions or explicit task/request prompts (i.e., directive phrases asking the model to do something).

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

to

0.53

of

0.52

auf

0.48

ă

0.48

 från

0.48

ão

0.46

een

0.46

 على

0.45

 with

0.43

 của

0.43

POSITIVE LOGITS

0.44

ის

0.43

 Dave

0.43

0.41

ございます

0.41

0.40

티

0.39

ке

0.39

정에

0.38

ג

0.38

Activations Density 12.205%