INDEX

Explanations

truthfulness and misinformation

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

greedy

0.53

 greed

0.47

pyg

0.47

 greedy

0.46

த்தனை

0.44

 industriales

0.44

kinase

0.44

狎

0.43

ANSAS

0.42

Prepar

0.41

POSITIVE LOGITS

 disinformation

0.98

 misinformation

0.95

 Opinions

0.89

 Opinion

0.88

 opinions

0.88

 opinion

0.86

 debunk

0.82

 defamation

0.77

 dissenting

0.76

 partisan

0.75

Activations Density 0.382%