INDEX

Explanations

continuations or results after punctuation

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

！！！

0.28

！！！！

0.26

 !!!!!

0.25

 !!!!

0.25

］

0.25

!!!!

0.24

 gemstones

0.24

۔

0.24

↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵

0.23

所以我

0.23

POSITIVE LOGITS

 Responsible

0.28

 Interestingly

0.28

同樣

0.28

 опять

0.27

 сможет

0.26

 Steering

0.25

 Again

0.25

 again

0.25

สามารถ

0.25

 может

0.24

Activations Density 1.390%