INDEX

Explanations

prevent from happening

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 from

0.69

 från

0.64

س

0.61

 από

0.60

from

0.59

ко

0.57

我

0.55

с

0.54

 từ

0.54

ifrån

0.54

POSITIVE LOGITS

0.67

 frightening

0.50

する

0.46

 felony

0.45

ר

0.45

be

0.43

 detoxification

0.43

 robbery

0.43

0.42

 shameful

0.42

Activations Density 0.042%