INDEX

Explanations

references to moral principles and the consequences of actions

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

redd

-0.06

zers

-0.06

èª

-0.06

è³

-0.06

vana

-0.06

enal

-0.05

 èª

-0.05

 mission

-0.05

 expense

-0.05

zee

-0.05

POSITIVE LOGITS

 rewards

0.09

 Rewards

0.08

rophe

0.07

_ALIAS

0.07

 mÃ¼nchen

0.07

à¸ģà¸£à¸£à¸¡

0.07

 nett

0.07

ubar

0.07

 rewarded

0.06

Ð±Ð¾Ð²

0.06

Activations Density 0.019%