INDEX

Explanations

head

New Auto-Interp

Configuration

Prompts (Dashboard)

16,384 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

-0.54

 femininas

-0.51

 skydd

-0.51

 afectadas

-0.47

Enders

-0.47

 volgt

-0.47

works

-0.47

 oscura

-0.46

 falsas

-0.46

 debía

-0.46

POSITIVE LOGITS

QUENCE

0.76

the

0.73

Aholisi

0.69

Eksterne

0.68

 Préférences

0.68

 redistribute

0.65

Hauptartikel

0.65

 their

0.64

wixt

0.64

 simulate

0.63

Activations Density 0.080%