INDEX

Explanations

English token approximation

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 downsides

0.50

 undermined

0.44

 downregulation

0.44

={"

0.43

 undermining

0.43

 immun

0.42

 shameful

0.42

 immunosupp

0.42

🩹

0.42

 stereotypes

0.42

POSITIVE LOGITS

و

0.50

0.48

directional

0.46

리

0.46

0.45

lake

0.44

conex

0.44

제

0.43

모

0.42

follow

0.42

Activations Density 0.003%