INDEX

Explanations

disinformation misinformation fake news

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 촬영

0.44

Pain

0.42

 introd

0.41

穷

0.41

Stim

0.41

 ছুই

0.41

 adventurer

0.41

algèbre

0.39

queleto

0.39

🌃

0.39

POSITIVE LOGITS

 disinformation

1.78

 misinformation

1.72

 fake

1.34

 propaganda

1.34

 Fake

1.25

Fake

1.22

 falsehood

1.20

fake

1.14

 propagand

1.13

 Propaganda

1.10

Activations Density 0.024%