INDEX

Explanations

phrases that contain various forms of self-reflection and self-awareness

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ato

-0.07

Ø§ÙĦÙĬ

-0.06

ESCO

-0.06

atos

-0.06

esco

-0.06

uito

-0.06

arters

-0.06

ìļ°

-0.06

alnÄ±z

-0.05

à¤£à¤¨

-0.05

POSITIVE LOGITS

RIA

0.08

 typical

0.08

 modern

0.07

 moderne

0.07

ovic

0.07

 ÑģÐ¾Ð²ÑĢÐµÐ¼

0.07

çİ°ä»£

0.07

 Modern

0.06

 Reeves

0.06

azzo

0.06

Activations Density 0.089%