INDEX

Explanations

ethical, safe, or harmful prompts

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ان

1.59

ن

1.49

न

1.43

ل

1.36

на

1.29

ور

1.28

ak

1.16

ной

1.13

ва

1.11

ే

1.11

POSITIVE LOGITS

1.04

1.00

ς

0.96

 MyDB

0.94

 koska

0.94

 Comme

0.94

 muita

0.91

 sodass

0.91

 osob

0.88

 doen

0.88

Activations Density 0.028%