INDEX

Explanations

derogatory language and humiliation

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

his

-1.98

 About

-1.84

Are

-1.75

 decides

-1.72

At

-1.71

-1.69

 resists

-1.69

Our

-1.64

Who

-1.63

 examines

-1.63

POSITIVE LOGITS

埽

1.98

and

1.78

 สอง

1.68

lc

1.64

 altında

1.64

仉

1.63

!');

1.63

一下

1.60

 blauwe

1.60

檍

1.59

Activations Density 0.008%