INDEX

Explanations

phrases referring to various forms of abuse and misuse of power

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

rei

-0.09

atura

-0.08

iky

-0.08

lify

-0.08

ãĤ·ãĤ¢

-0.08

istributions

-0.08

cheng

-0.07

uida

-0.07

aras

-0.07

apsed

-0.07

POSITIVE LOGITS

/add

0.07

 Dhabi

0.07

fully

0.07

able

0.07

ãĥ¥

0.06

antium

0.06

full

0.06

erland

0.06

ant

0.06

Activations Density 0.009%