INDEX

Explanations

distinguishing relevant parts/features

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 paheli

0.49

 heartache

0.49

 troubleshooting

0.49

 unbelievably

0.46

 ужа

0.45

🅘

0.45

 fairytale

0.44

 heartbreak

0.44

 строительства

0.44

 insanely

0.44

POSITIVE LOGITS

 latent

0.63

 spatially

0.60

 discretized

0.55

``

0.52

 salient

0.51

 syntactic

0.51

learned

0.49

global

0.48

 spatial

0.48

 underlying

0.48

Activations Density 0.191%