INDEX

Explanations

phrases related to emotional safety and belonging

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

elper

-0.07

>manual

-0.06

dot

-0.06

avor

-0.06

 ê¶ģ

-0.06

 divis

-0.06

ocu

-0.06

ÑĤÑĢÐ¾

-0.06

anj

-0.06

olec

-0.06

POSITIVE LOGITS

 safe

0.08

-safe

0.07

safe

0.07

 Safe

0.07

 identities

0.07

 relaxation

0.07

Safe

0.07

 safely

0.07

identity

0.07

 Identity

0.06

Activations Density 0.026%