INDEX

Explanations

derogatory terms and insults

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ists

-0.07

ajs

-0.07

erli

-0.07

gebung

-0.07

porter

-0.07

aris

-0.06

orie

-0.06

mates

-0.06

Lig

-0.06

PORT

-0.06

POSITIVE LOGITS

who

0.10

ery

0.09

who

0.08

rous

0.08

hood

0.08

/exp

0.07

 whom

0.07

ERY

0.07

èĽĭ

0.07

like

0.07

Activations Density 0.046%