INDEX

Explanations

code comments and structure

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

狀態

0.46

 Till

0.42

Two

0.41

 Mout

0.40

 Stockton

0.40

 Downing

0.39

 Millet

0.39

笤

0.38

 दोन

0.38

反馈

0.38

POSITIVE LOGITS

 bytes

0.39

 hypotheses

0.38

მ

0.37

 stereotypes

0.37

 misused

0.36

 unsuitable

0.36

 abuse

0.36

Allocator

0.35

Jährige

0.35

bytes

0.35

Activations Density 0.000%