INDEX

Explanations

specifies actions and their consequences

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

set

-1.44

mis

-1.39

Re

-1.34

had

-1.30

 took

-1.27

 made

-1.27

le

-1.26

Her

-1.25

si

-1.24

POSITIVE LOGITS

 всички

1.83

 tunik

1.72

 bluz

1.70

 OGSÅ

1.70

 superbes

1.70

mainly

1.62

 karier

1.61

 cewek

1.60

 incrí

1.59

Mainly

1.56

Activations Density 0.009%