INDEX

Explanations

confidently fabricated information

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

描写

0.66

 داری

0.66

ণিজ্য

0.65

ጾ

0.65

 Window

0.65

submenu

0.64

Dom

0.64

型

0.63

 Interrupt

0.63

 programming

0.63

POSITIVE LOGITS

 falsehood

2.08

 disinformation

1.89

 debunk

1.78

 hoax

1.75

 credibility

1.74

 거짓

1.72

 skepticism

1.72

 disbelief

1.72

 misinformation

1.72

 myths

1.68

Activations Density 0.378%