INDEX

Explanations

crimes against humanity and violence

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

柠檬

0.66

图

0.65

 સરળ

0.63

 yardımcı

0.62

 préférences

0.61

流畅

0.61

 IntelliJ

0.60

 பரபர

0.60

鸟

0.60

XML

0.58

POSITIVE LOGITS

 atrocities

1.37

 genocide

1.30

 killings

1.30

rocities

1.24

 brutality

1.20

 horrific

1.16

 massacre

1.15

 massac

1.08

 violence

1.05

 injustices

1.05

Activations Density 0.149%