INDEX

Explanations

false negatives, code for, differences between

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 самой

0.34

ংকার

0.34

 არის

0.32

Ს

0.32

 طرز

0.31

Total

0.31

 moniker

0.31

근

0.30

 totalitarian

0.30

ाये

0.30

POSITIVE LOGITS

 crucially

0.32

 mentally

0.32

 Cour

0.31

 Rockets

0.30

 Cruises

0.29

 Cork

0.29

 plus

0.28

closing

0.28

 Yates

0.28

while

0.28

Activations Density 0.001%