INDEX

Explanations

AI assistant safety refusals

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ᠴ

0.46

 Hatteras

0.45

 Polaribacter

0.44

 कर्नाटक

0.44

 Karnataka

0.42

)^*

0.41

 Moab

0.41

 हिमाचल

0.40

星

0.39

 MDLVertex

0.39

POSITIVE LOGITS

 Glasgow

1.96

Glasgow

1.92

 Glas

1.59

glas

1.52

 glas

1.52

Glas

1.50

 Clyde

1.05

 ग्लास

1.01

グラス

0.92

 Scottish

0.88

Activations Density 0.011%