INDEX

Explanations

positive descriptions of people's friendliness and helpfulness

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

wang

-0.07

ame

-0.06

zie

-0.06

(&(

-0.06

AME

-0.06

eggies

-0.06

ponsored

-0.06

phies

-0.05

Ã©mon

-0.05

POSITIVE LOGITS

 friendly

0.10

friendly

0.09

 Friendly

0.09

 hospitality

0.08

-friendly

0.08

Friendly

0.08

 welcoming

0.08

 genuine

0.07

 staff

0.07

 Helpful

0.07

Activations Density 0.056%