INDEX

Explanations

technical terms and code

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Policy

0.42

 Sebelum

0.42

 политика

0.42

ח

0.40

 risky

0.40

 स्वीकृत

0.39

ственном

0.39

ጥ

0.39

seeker

0.39

க்கான

0.38

POSITIVE LOGITS

 indication

0.41

 discoloration

0.40

 kontak

0.39

 synt

0.39

 specifications

0.39

 glorie

0.37

UIColor

0.37

 interag

0.37

 urination

0.36

 rearr

0.36

Activations Density 0.001%