INDEX

Explanations

phrases indicating denial or dismissal of accountability

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

è¬

-0.08

mouseleave

-0.07

uart

-0.07

 hete

-0.07

.scalablytyped

-0.07

URY

-0.07

defgroup

-0.07

orgia

-0.07

overn

-0.07

interop

-0.07

POSITIVE LOGITS

 really

0.07

 anymore

0.06

really

0.06

 Really

0.06

Really

0.06

 actually

0.06

emie

0.05

 fatt

0.05

awk

0.05

Activations Density 0.028%