INDEX

Explanations

words related to moral and ethical judgments

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

PlotsExplanationShow Test FieldDefault Test Text

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ÄħÅ¼

-0.08

-Ä±

-0.07

 Walsh

-0.07

lify

-0.06

gaard

-0.06

atori

-0.06

iazza

-0.06

avec

-0.06

unden

-0.06

ModelError

-0.06

POSITIVE LOGITS

ed

0.11

sa

0.08

do

0.07

ged

0.07

si

0.07

ing

0.07

ingly

0.07

us

0.07

vo

0.07

Activations Density 0.005%