INDEX

Explanations

contempt and insults towards others

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 muitas

0.43

 увлека

0.42

真的

0.40

✤

0.40

快速

0.40

 রূপান্তরিত

0.40

多く

0.39

埝

0.39

 வாழ்

0.39

amiliar

0.38

POSITIVE LOGITS

 idiots

1.56

 stupid

1.55

 idiot

1.46

 stupidity

1.32

 fools

1.23

 foolish

1.23

 disgusting

1.20

 ridiculous

1.13

 Stupid

1.10

 silly

1.08

Activations Density 0.028%