INDEX

Explanations

explanations/theories

New Auto-Interp

Configuration

Prompts (Dashboard)

16,384 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Anſ

-1.16

 houſe

-1.14

 Houſe

-1.13

 purpoſe

-1.13

 myſelf

-1.12

 pleaſure

-1.10

 ſmall

-1.09

 ſche

-1.05

 ſever

-1.05

 Majefty

-1.03

POSITIVE LOGITS

of

0.84

is

0.75

0.72

on

0.72

in

0.69

0.68

for

0.68

and

0.63

 also

0.61

Activations Density 0.456%