INDEX

Explanations

actions and states

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

destino

0.43

Netflix

0.43

CONDUCT

0.43

Axis

0.42

|_{

0.40

か

0.40

conduct

0.39

Ӏо

0.39

Compress

0.39

innest

0.39

POSITIVE LOGITS

 disputed

0.42

 disclosing

0.42

 earthy

0.42

 protection

0.41

挾

0.40

 discloses

0.40

 undisclosed

0.39

 bodyguard

0.39

 disclose

0.38

 guarding

0.38

Activations Density 0.002%