INDEX

Explanations

emphasis and confirmation

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

Инг

-1.66

؎

-1.55

🌫

-1.50

 asesinado

-1.45

 něm

-1.45

centaje

-1.43

"}

-1.43

)}

-1.41

ांकि

-1.40

挥手

-1.38

POSITIVE LOGITS

is

1.84

1.71

1.68

of

1.64

-*

1.63

one

1.58

id

1.58

al

1.57

 other

1.56

for

1.51

Activations Density 0.002%