INDEX

Explanations

technical/academic citations

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 évidemment

-0.79

 bajos

-0.78

}}\

-0.78

 jamais

-0.77

arked

-0.75

;=

-0.72

हरू

-0.72

dedicated

-0.71

cal

-0.70

 Fake

-0.70

POSITIVE LOGITS

 Abba

0.83

illez

0.80

ffs

0.75

%-

0.75

 handout

0.74

Uru

0.73

سیون

0.73

 🥺

0.73

 boho

0.72

 minum

0.72

Activations Density 0.035%