INDEX

Explanations

shown

New Auto-Interp

Configuration

Prompts (Dashboard)

16,384 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 térm

-0.63

 shown

-0.62

 Shown

-0.58

umumkan

-0.55

 sœurs

-0.52

 refiere

-0.52

 genoux

-0.52

ruptedException

-0.51

 prochaines

-0.51

 plais

-0.51

POSITIVE LOGITS

 that

0.76

 EconPapers

0.60

InjectAttribute

0.60

+:+

0.60

me

0.59

 noDo

0.58

tovers

0.56

();)

0.55

how

0.54

us

0.54

Activations Density 0.031%