INDEX

Explanations

terms and phrases related to regulations and their impacts on various systems

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

podob

-0.07

scoped

-0.07

erras

-0.07

-0.06

IMIT

-0.06

æľīåħ³

-0.06

áº·t

-0.06

MinMax

-0.06

Kov

-0.06

æķı

-0.06

POSITIVE LOGITS

 reward

0.10

 incentiv

0.10

 rewards

0.10

 rewarding

0.10

 discrimin

0.10

 discrim

0.09

 discriminate

0.09

 rewarded

0.09

 saddle

0.09

 Reward

0.09

Activations Density 0.066%