INDEX

Explanations

work with

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 spion

-0.98

 frau

-0.96

 underpin

-0.93

 psychiat

-0.93

 Márquez

-0.92

 sezonu

-0.91

 horloge

-0.91

 bombar

-0.91

绡

-0.90

Insights

-0.90

POSITIVE LOGITS

out

1.76

up

1.67

 through

1.63

it

1.52

 with

1.49

on

1.47

 from

1.29

 backwards

1.23

 Working

1.23

 things

1.20

Activations Density 0.025%