INDEX

Explanations

phrases related to moral and ethical judgments

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

lero

-0.07

rip

-0.07

á»Ľt

-0.07

_preds

-0.06

Å¡ti

-0.06

Ð½Ð°ÑĩÐµ

-0.06

Ð³Ð»

-0.06

instead

-0.06

 Ø¶ÙħÙĨ

-0.06

 pena

-0.06

POSITIVE LOGITS

 physical

0.15

physical

0.14

 Physical

0.13

 overt

0.13

 direct

0.13

 directly

0.12

Physical

0.12

direct

0.11

 obvious

0.11

 physically

0.11

Activations Density 0.058%