INDEX

Explanations

preference

New Auto-Interp

Configuration

Prompts (Dashboard)

16,384 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 prefer

-1.59

 preferred

-1.53

 Prefer

-1.53

prefer

-1.45

Prefer

-1.42

preferred

-1.40

 prefers

-1.39

 Preferred

-1.34

Preferred

-1.25

ideal

-1.18

POSITIVE LOGITS

to

0.58

 convincing

0.50

 purchasing

0.49

going

0.49

 buying

0.47

 morning

0.46

 going

0.45

 doing

0.45

imeter

0.44

κτή

0.44

Activations Density 0.434%