INDEX

Explanations

p: legions, steering, models

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

Wt

0.83

Down

0.79

Vx

0.77

্স

0.77

澼

0.77

Siehe

0.76

 એપ

0.76

mediately

0.75

𝗺

0.75

Unless

0.74

POSITIVE LOGITS

ادر

0.86

 misappropri

0.80

 healers

0.79

 neoliberal

0.76

 insurrection

0.74

 Malawi

0.74

 streetwear

0.74

 violin

0.73

 sprawie

0.73

 Tibetan

0.73

Activations Density 0.001%