INDEX

Explanations

expressions of affirmation or affirmation-related language

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 itſelf

-1.19

 ſever

-1.02

 myſelf

-1.01

 pleaſure

-0.98

 ſche

-0.97

 ſmall

-0.96

 Monfieur

-0.96

 raiſ

-0.95

 Diſ

-0.94

 Houſe

-0.94

POSITIVE LOGITS

li

0.61

 null

0.54

Mar

0.53

di

0.53

gj

0.53

Di

0.52

0.51

de

0.51

“

0.51

ab

0.51

Activations Density 0.189%