INDEX

Explanations

words and phrases related to harm, violence, and injuries

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

/new

-0.07

asso

-0.07

TL

-0.07

ego

-0.06

PR

-0.06

á½°

-0.06

aron

-0.06

ickey

-0.06

eg

-0.06

agn

-0.06

POSITIVE LOGITS

 somebody

0.09

 someone

0.09

 others

0.08

 anybody

0.08

ãĥ¼ãĥ«

0.07

 anyone

0.07

 people

0.07

others

0.07

 oppon

0.07

someone

0.07

Activations Density 0.038%