INDEX

Explanations

pre-post modifiers

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 seag

0.36

舉行

0.35

 shirts

0.35

 achter

0.35

 tinha

0.34

नियां

0.33

音が

0.33

 prettiest

0.33

ጭ

0.33

fur

0.33

POSITIVE LOGITS

 nuanced

0.39

 perceptive

0.39

 scalable

0.39

 proactive

0.38

 normative

0.37

 methodological

0.37

 societal

0.36

 mechanistic

0.36

 impactful

0.36

 transformative

0.36

Activations Density 0.042%