INDEX

Explanations

sparked discussions, improving methods

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

τή

0.49

 bermanfaat

0.46

 отлично

0.46

 well

0.42

son

0.41

 dobrze

0.41

ều

0.40

ำ

0.40

 хорошо

0.40

 sử

0.40

POSITIVE LOGITS

 힘들

0.49

ಏ

0.48

缛

0.46

僟

0.44

 perturbations

0.44

 REACTORS

0.43

 הסי

0.43

 सियासी

0.43

 inéd

0.43

 craziness

0.42

Activations Density 0.002%