INDEX

Explanations

stabilization

New Auto-Interp

Configuration

Prompts (Dashboard)

16,384 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 stabilization

-1.87

 stabilisation

-1.84

 Stabilization

-1.63

 stabilized

-1.52

stabili

-1.48

 stabilize

-1.40

 Stabili

-1.36

 stabilizing

-1.35

 stabili

-1.33

stability

-1.27

POSITIVE LOGITS

ist

0.62

0.48

con

0.48

ian

0.46

de

0.45

0.44

em

0.42

ses

0.42

de

0.42

ized

0.41

Activations Density 0.240%