INDEX

Explanations

neutral, pausing, scripts

New Auto-Interp

Configuration

Prompts (Dashboard)

16,384 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 neutral

-1.47

neutral

-1.34

 Neutral

-1.27

Neutral

-1.19

 neutrality

-1.08

 neutre

-0.96

 nation

-0.91

 neutr

-0.90

 нейтра

-0.89

 pause

-0.88

POSITIVE LOGITS

 pretence

0.46

act

0.46

istoitu

0.46

(!

0.45

 bezeichneter

0.45

 unworthy

0.44

element

0.44

fjspx

0.44

 pretreatment

0.44

Westfalen

0.44

Activations Density 1.533%