INDEX

Explanations

sensitive topics and policy violations

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

GTBase

0.46



0.45

ிருக்கிறார்

0.44

 سلاټونه

0.43

 χρή

0.43

'`--'`--

0.42

 Εκ

0.42

𝙧

0.41

foregroundView

0.41

 θέ

0.41

POSITIVE LOGITS

0.55

 with

0.50

 kaum

0.46

et

0.44

 solely

0.42

 database

0.41

ל

0.41

 this

0.41

 same

0.40

too

0.40

Activations Density 0.000%