INDEX

Explanations

programmed AI safety constraints

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 bát

0.44

포츠

0.43

峎

0.43

 whereupon

0.43

 historie

0.42

 Hiroshima

0.42

Ono

0.41

 ጥሩ

0.40

ம்ப

0.40

ostante

0.40

POSITIVE LOGITS

🔥🔥

0.41

 Priorities

0.40

))

0.39

 Меня

0.39

不仅仅

0.39

 uygulam

0.38

 COMPLEXES

0.37

着

0.37

 replicated

0.37

https

0.37

Activations Density 0.025%