INDEX

Explanations

concerns about potential negative outcomes and risks

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ober

-0.07

_probability

-0.07

ÐµÑĢÐ¿

-0.07

oden

-0.07

jedn

-0.07

íĻĺ

-0.07

ãĥ³ãĥĩ

-0.07

iloc

-0.07

ivor

-0.06

ëĮĢìĿĺ

-0.06

POSITIVE LOGITS

too

0.12

 might

0.12

might

0.11

TOO

0.11

too

0.10

 Might

0.10

å¤ª

0.09

Too

0.08

-too

0.08

Too

0.08

Activations Density 0.043%