INDEX

Explanations

instances of power dynamics and compliance in interactions

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ighth

-0.07

OfClass

-0.07

ahan

-0.06

ì¹´ëĿ¼

-0.06

Ð°ÑĢÑĩ

-0.06

ETYPE

-0.06

terra

-0.06

ittest

-0.06

ãĥ©ãĤ¹

-0.06

clamation

-0.06

POSITIVE LOGITS

 kabul

0.09

 accept

0.09

 tempor

0.09

 ACCEPT

0.09

 acceptance

0.08

 accepts

0.08

 agree

0.08

 Accept

0.08

_accept

0.08

 accepted

0.08

Activations Density 0.027%