INDEX

Explanations

phrases expressing skepticism or critique towards authority and societal norms

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

-0.06

 role

-0.06

 bias

-0.06

Wor

-0.05

 background

-0.05

 included

-0.05

ombs

-0.05

Jew

-0.05

Âł

-0.05

POSITIVE LOGITS

rysler

0.08

tility

0.08

 Incontri

0.08

íĨłíĨł

0.08

odash

0.07

asin

0.07

everything

0.07

ItemList

0.07

StackSize

0.07

á»ģ

0.07

Activations Density 0.016%