INDEX

Explanations

the word related to censorship or content moderation

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 كره

-0.59

 Avan

-0.54

Avan

-0.52

%");

-0.50

Crm

-0.50

kannt

-0.49

SqlDataReader

-0.49

 Roskov

-0.48

 useAuth

-0.48

actionMode

-0.48

POSITIVE LOGITS

 censorship

3.66

 censor

3.08

 Cens

2.84

 censored

2.69

 censura

2.27

cens

2.09

censored

1.62

 cens

1.46

 censure

1.19

ensors

1.11

Activations Density 0.001%