INDEX

Explanations

terms related to consequences, impacts, and values in various contexts

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ortal

-0.08

agh

-0.07

storybook

-0.07

“He

-0.07

-0.06

å¯¹æĸ¹

-0.06

ighbor

-0.06

edia

-0.06

porno

-0.06

CLAIM

-0.06

POSITIVE LOGITS

him

0.12

his

0.11

me

0.11

us

0.10

you

0.10

sua

0.10

èĩªå·±

0.10

 their

0.09

 jego

0.09

 suas

0.09

Activations Density 0.003%