INDEX

Explanations

instances of negation or refusal

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

plex

-0.07

ertz

-0.07

ycastle

-0.07

/licenses

-0.07

inine

-0.06

ookies

-0.06

ë¹Ļ

-0.06

xes

-0.06

enerate

-0.06

ÌĪ

-0.06

POSITIVE LOGITS

uds

0.07

assy

0.06

 Wade

0.06

ãģ®äºº

0.06

bod

0.06

ddd

0.06

 Huck

0.06

 selectable

0.06

ÏĦÎ·

0.06

Parallel

0.06

Activations Density 0.002%