INDEX

Explanations

mentioning specific details

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Determine

0.66

 Demonstrated

0.58

 Understanding

0.55

 demonstrated

0.55

 Determining

0.52

 för

0.51

 Defender

0.51

𝟎

0.51

 для

0.50

۔

0.50

POSITIVE LOGITS

0.79

 erwäh

0.72

 erwähnt

0.69

 mention

0.67

Mention

0.66

提到的

0.66

К

0.63

mention

0.61

0.60

 mencion

0.60

Activations Density 0.046%