INDEX

Explanations

pronoun focus, especially questions about self

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 관련

0.84

infodisc

0.82

 ваших

0.80

เหมือน

0.79

躹

0.78

 noss

0.78

 되고

0.78

겪

0.77

 スウェット

0.77

 เกี่ยว

0.77

POSITIVE LOGITS

1.67

我

1.48

we

1.39

 tôi

1.22

 আমি

1.22

 मैं

1.19

 saya

1.19

আমি

1.13

me

1.13

Activations Density 0.053%