INDEX

Explanations

concepts relating to reflection and introspection

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ÐµÐ½ÐºÐ°

-0.07

nesc

-0.07

terra

-0.07

ephy

-0.07

arkin

-0.07

ovnÃŃ

-0.07

mods

-0.07

ymes

-0.07

 æ¯

-0.07

tera

-0.06

POSITIVE LOGITS

 reflection

0.14

 Reflection

0.13

 reflections

0.13

reflection

0.13

 mirrors

0.13

 mirror

0.13

Reflection

0.13

 Mirror

0.12

 reflected

0.12

 reflect

0.11

Activations Density 0.021%