INDEX

Explanations

phrases indicating feelings of guilt or accusations of wrongdoing

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

TRACE

-0.08

erli

-0.08

 Trace

-0.08

bay

-0.08

æĤł

-0.07

 Ð»Ð¾Ð¶

-0.07

ë³

-0.07

è½

-0.07

Trace

-0.07

 ÑĥÐ»ÑĥÑĩ

-0.07

POSITIVE LOGITS

 catch

0.06

 both

0.06

islav

0.06

ilver

0.06

 definition

0.06

catch

0.06

mith

0.06

cher

0.06

 revis

0.06

opyright

0.05

Activations Density 0.002%