INDEX

Explanations

describing negative situations

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

amerika

0.42

 নিদ্র

0.41

 sün

0.40

ட்சத்திர

0.40

 indeks

0.40

ैटिन

0.39

 खाते

0.38

鮋

0.38

notice

0.38

Paj

0.37

POSITIVE LOGITS

 farce

0.50

 overcrowding

0.46

 happening

0.45

 unbearable

0.44

 ناقابل

0.44

 intolerable

0.43

 walls

0.43

 absurdity

0.43

 wretched

0.42

 curfew

0.42

Activations Density 0.039%