INDEX

Explanations

joy, score, augmentation, descriptive

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

</h2>

0.43

InitStruct

0.43

 Histogram

0.40

 Flow

0.40

 Prefer

0.40

 preferring

0.40

prefer

0.39

 предпочита

0.39

 Follow

0.38

FP

0.38

POSITIVE LOGITS

幄

0.46

льнай

0.43

ியும்

0.41

ське

0.40

тельный

0.39

 lini

0.38

ною

0.38

ॉर्ड

0.38

騏

0.37

<unused19>

0.37

Activations Density 0.000%