INDEX

Explanations

references to harm, injuries, or casualties in various contexts

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

erville

-0.07

ikes

-0.07

æª

-0.06

Î¸ÎŃ

-0.06

ville

-0.06

ough

-0.05

 Pioneer

-0.05

innie

-0.05

ils

-0.05

 Bolton

-0.05

POSITIVE LOGITS

à¹ĥà¸Ļà¸ģà¸²à¸£

0.09

.scalablytyped

0.08

 accordingly

0.07

izzy

0.07

Å¾en

0.07

lightbox

0.07

 dabei

0.07

 dafÃ¼r

0.07

edar

0.06

á»ĳi

0.06

Activations Density 0.045%