INDEX

Explanations

disinformation and misinformation

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

北海道

0.54

сного

0.53

家族

0.51

 homozyg

0.51

льного

0.51

 recorrido

0.50

ገል

0.50

त्ति

0.48

夗

0.47

 pregnancies

0.47

POSITIVE LOGITS

 disinformation

0.73

you

0.64

filtering

0.64

 Propaganda

0.61

 misinformation

0.61

we

0.59

 propaganda

0.59

 your

0.58

 countermeasures

0.58

 Markt

0.57

Activations Density 0.093%