INDEX

Explanations

indoctrination and mind control

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 scra

0.43

 hiệp

0.42

 ನೆಲ

0.41

 respectful

0.41

 respectfully

0.40

 scrapes

0.40

낑

0.40

 ಗೌ

0.39

딪

0.39

썽

0.39

POSITIVE LOGITS

 indoctr

1.75

 propaganda

1.61

 indoct

1.55

 propagand

1.43

 Propaganda

1.39

 manipulation

1.23

 conditioning

1.20

 brain

1.17

 Manipulation

1.16

 manipulated

1.11

Activations Density 0.052%