INDEX

Explanations

phrases related to distinguishing fact from fiction

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ÑģÐ¾Ð»

-0.08

eum

-0.08

rÅ¾

-0.08

ICAST

-0.08

alf

-0.08

eut

-0.08

rÃ¡m

-0.08

SURE

-0.08

peÄį

-0.08

âĢ¦"↵↵

-0.07

POSITIVE LOGITS

 everything

0.07

 quot

0.06

/of

0.05

 ÑģÐ¾Ð±Ð¾Ñİ

0.05

 reality

0.05

 increasingly

0.05

/or

0.05

ernet

0.05

 ideology

0.05

 logos

0.05

Activations Density 0.039%