INDEX

Explanations

statements involving personal pronouns and their associated actions or thoughts

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

isÃŃ

-0.08

ÄĻ

-0.07

kowski

-0.07

(æĹ¥

-0.07

odate

-0.07

rone

-0.07

olle

-0.07

erece

-0.07

Ì§

-0.07

à¸

-0.07

POSITIVE LOGITS

eter

0.07

 ultimate

0.07

 term

0.07

ulet

0.06

aves

0.06

 incident

0.06

urd

0.06

 must

0.06

anger

0.06

can

0.06

Activations Density 0.033%