INDEX

Explanations

memorizing, perpetuating, destabilizing, manipulating

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

🫦

0.44

ingly

0.42

िंग्स

0.41

ől

0.39

oshin

0.38

釗

0.38

assurer

0.38

 जेब

0.38

Resultado

0.38

 querer

0.37

POSITIVE LOGITS

ating

0.45

izing

0.45

化

0.44

ization

0.42

isation

0.41

ATING

0.39

নমেন্ট

0.38

ize

0.37

erc

0.36

gum

0.36

Activations Density 0.273%