INDEX

Explanations

concepts related to ethical behavior and moral conduct

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

á»Ń

-0.07

 strtok

-0.06

errupted

-0.06

 oprÃ¡v

-0.06

kem

-0.06

 Prim

-0.06

,...↵↵

-0.06

cul

-0.06

xl

-0.06

 prim

-0.06

POSITIVE LOGITS

 actions

0.08

 action

0.08

icular

0.07

 ACTIONS

0.07

ACTION

0.07

action

0.07

actions

0.07

 Ø³Ø¹

0.07

Action

0.07

 decisions

0.07

Activations Density 0.012%