INDEX

Explanations

references to figures

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 الرياضيه

-0.67

openhauer

-0.61

riezmann

-0.58

prüche

-0.58

 betweenstory

-0.57

 Pollutants

-0.57

 psoriasis

-0.56

ftagPool

-0.56

 gyrus

-0.56

+#+

-0.56

POSITIVE LOGITS

HideFlags

0.58

epam

0.57

Aiheesta

0.54

horabuena

0.54

 confira

0.54

робнее

0.53

rave

0.53

tanleria

0.52

ұл

0.52

feitura

0.52

Activations Density 0.009%