INDEX

Explanations

actions and consequences

New Auto-Interp

Configuration

Prompts (Dashboard)

16,384 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 reacted

-0.96

 invaded

-0.95

 sinned

-0.93

 imitated

-0.92

 consulted

-0.91

 succeeded

-0.88

 cheated

-0.88

 betrayed

-0.88

 prayed

-0.87

 complied

-0.86

POSITIVE LOGITS

0.62

new

0.57

‘

0.55

an

0.54

“

0.54

 both

0.53

 recent

0.53

her

0.51

 carefully

0.50

 many

0.49

Activations Density 0.082%