INDEX

Explanations

hate speech, artifacts, units, programming language constructs

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 وغیرہ

1.94

。）

1.92

).]

1.78

 posteriormente

1.77

。)

1.64

 generalmente

1.60

)].

1.58

經常

1.57

 fréquemment

1.56

.)).

1.54

POSITIVE LOGITS

 blissful

2.24

 reinvent

2.18

 timeless

2.17

 dreamy

2.16

 transformative

2.15

 defiant

2.15

 pristine

2.13

 soulful

2.12

打造

2.12

 exhilarating

2.11

Activations Density 1.480%