INDEX

Explanations

guidance

New Auto-Interp

Configuration

Prompts (Dashboard)

16,384 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 encouragement

-1.10

 confusion

-1.00

encouragement

-0.96

 excitement

-0.91

 fascination

-0.91

 persuasion

-0.88

 enthusiasm

-0.86

 reassurance

-0.86

 speculation

-0.84

 uncertainty

-0.83

POSITIVE LOGITS

 about

0.69

 regarding

0.67

 from

0.66

of

0.60

 based

0.58

 caused

0.56

 made

0.55

 level

0.55

 with

0.54

for

0.51

Activations Density 0.047%